Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise
Abstract: Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AUROC) while mitigating bias (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean (AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while mitigating potential bias compared to existing approaches in the presence of instance-dependent label noise.
- Machine bias, 2016.
- A closer look at memorization in deep networks. In ICML, 2017.
- From noisy prediction to true label: Noisy prediction calibration via generative model. In International Conference on Machine Learning, pages 1277–1297. PMLR, 2022.
- Confidence scores make instance-dependent label-noise learning possible. In International Conference on Machine Learning, pages 825–836. PMLR, 2021.
- Misdiagnosis of atrial fibrillation and its clinical consequences. The American journal of medicine, 117(9):636–642, 2004.
- Roger C Bone. Sepsis and its complications: the clinical problem. Critical care medicine, 22(7):S8–11, 1994.
- Curriculum labeling: Revisiting pseudo-labeling for semi-supervised learning. In Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021.
- Disparate censorship & undertesting: A source of label bias in clinical machine learning. In MLHC, 2022.
- Understanding and utilizing deep neural networks trained with noisy labels. In International Conference on Machine Learning, pages 1062–1070, 2019.
- Noise against noise: stochastic label noise helps combat inherent label noise. In International Conference on Learning Representations, 2021.
- Class-dependent label-noise learning with cycle-consistency regularization. In Advances in Neural Information Processing Systems, 2022.
- Learning with instance-dependent label noise: A sample sieve approach. In International Conference on Learning Representations, 2020a.
- Learning with bounded instance and label-dependent label noise. In International Conference on Machine Learning, pages 1789–1799. PMLR, 2020b.
- Fairness in biometrics: a figure of merit to assess biometric verification systems. arXiv e-prints, pages arXiv–2011, 2020.
- Uci machine learning repository. http://archive.ics.uci.edu/ml, 2017.
- Generalized jensen-shannon divergence loss for learning with noisy labels. Advances in Neural Information Processing Systems, 34, 2021.
- Can cross entropy loss be robust to label noise? In International Joint Conferences on Artificial Intelligence, pages 2206–2212, 2020.
- Noise elimination in inductive concept learning: A case study in medical diagnosis. In International Workshop on Algorithmic Learning Theory, pages 199–212. Springer, 1996.
- Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Advances in neural information processing systems, pages 8527–8537, 2018.
- Sigua: Forgetting may make learning with noisy labels more robust. In International Conference on Machine Learning, pages 4006–4016. PMLR, 2020.
- Equality of opportunity in supervised learning. Advances in neural information processing systems, 29:3315–3323, 2016.
- Visualchexbert: addressing the discrepancy between radiology report labels and image labels. In Proceedings of the Conference on Health, Inference, and Learning, pages 105–115, 2021.
- Identifying and correcting label bias in machine learning. In International Conference on Artificial Intelligence and Statistics, pages 702–712. PMLR, 2020.
- Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning, pages 2304–2313. PMLR, 2018.
- An information fusion approach to learning with instance-dependent label noise. In International Conference on Learning Representations, 2022.
- MIMIC-III clinical database (version 1.4), 2016a.
- Mimic-iii, a freely accessible critical care database. Scientific data, 3:160035, 2016b.
- Generating multiple noise elimination filters with the ensemble-partitioning filter. In Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, 2004. IRI 2004., pages 369–375. IEEE, 2004.
- Fine samples for learning with noisy labels. Advances in Neural Information Processing Systems, 34, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Phekb: a catalog and workflow for creating electronic phenotype algorithms for transportability. Journal of the American Medical Informatics Association, 23(6):1046–1052, 2016.
- Diagnostic difficulty and error in primary care—a systematic review. Family practice, 25(6):400–413, 2008.
- Robustness of prevalence estimates derived from misclassified data from administrative databases. Biometrics, 63(1):272–279, 2007.
- Dividemix: Learning with noisy labels as semi-supervised learning. In International Conference on Learning Representations, 2020.
- Estimating noise transition matrix with label correlations for noisy multi-label learning. In Advances in Neural Information Processing Systems, 2022.
- Classification with noisy labels by importance reweighting. IEEE Transactions on pattern analysis and machine intelligence, 38(3):447–461, 2015.
- Yang Liu. Understanding instance-level label noise: Disparate impacts and treatments. In International Conference on Machine Learning, pages 6725–6735. PMLR, 2021.
- Peer loss functions: Learning from noisy labels without knowing noise rates. In International Conference on Machine Learning, pages 6226–6236. PMLR, 2020.
- Does label smoothing mitigate label noise? In International Conference on Machine Learning. PMLR, 2020.
- Normalized loss functions for deep learning with noisy labels. In International Conference on Machine Learning, pages 6543–6553. PMLR, 2020.
- Disparities in physicians’ interpretations of heart disease symptoms by patient gender: results of a video vignette factorial experiment. Journal of women’s health, 18(10):1661–1667, 2009.
- Can gradient clipping mitigate label noise? In International Conference on Learning Representations, 2019.
- Henry A Nasrallah. Consequences of misdiagnosis: inaccurate treatment and poor patient outcomes in bipolar disorder. The Journal of clinical psychiatry, 76(10):27608, 2015.
- Self: Learning to filter noisy labels with self-ensembling. In International Conference on Learning Representations, 2020.
- Development and validation of a pragmatic electronic phenotype for ckd. Clinical Journal of the American Society of Nephrology, 14(9):1306–1314, 2019.
- Relaxed parameter sharing: Effectively modeling time-varying relationships in clinical time-series. In MLHC, 2019.
- Automatic differentiation in pytorch. 2017.
- Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1944–1952, 2017.
- Counterfactual reasoning for fair clinical risk prediction. In Machine Learning for Healthcare Conference, pages 325–358. PMLR, 2019.
- An empirical characterization of fair machine learning for clinical risk prediction. Journal of biomedical informatics, 113:103621, 2021.
- Susan K Pingleton. Complications of acute respiratory failure. Am Rev Respir Dis, 137(6):1463–1493, 1988.
- Ensuring fairness in machine learning to advance health equity. Annals of internal medicine, 169(12):866–872, 2018.
- Unlabeled data: Now it helps, now it doesn’t. Advances in neural information processing systems, 21, 2008.
- Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Respecting autonomy and enabling diversity: The effect of eligibility and enrollment on research data demographics: Study examines the effect of eligibility and enrollment on research data demographics. Health Affairs, 40(12):1892–1899, 2021.
- Beyond misdiagnosis, misunderstanding and mistrust: relevance of the historical perspective in the medical and mental health treatment of people of color. Journal of the National Medical Association, 99(8):879, 2007.
- Democratizing ehr analyses with fiddle: a flexible data-driven preprocessing pipeline for structured clinical data. Journal of the American Medical Informatics Association, 27(12):1921–1934, 2020.
- Support vector machine for outlier detection in breast cancer survivability prediction. In Asia-Pacific Web Conference, pages 99–109. Springer, 2008.
- Cohort discovery and risk stratification for alzheimer’s disease: an electronic health record-based approach. Alzheimer’s & Dementia: Translational Research & Clinical Interventions, 6(1):e12035, 2020.
- Automated diabetes case identification using electronic health record data at a tertiary care facility. Mayo Clinic Proceedings: Innovations, Quality & Outcomes, 1(1):100–110, 2017.
- Sofie Verbaeten. Identifying mislabeled training examples in ilp classification problems. In Proceedings of twelfth Belgian-Dutch conference on machine learning, pages 1–8, 2002.
- Learning from noisy labels with complementary loss functions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10111–10119, 2021a.
- Fair classification with group-dependent label noise. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 526–536, 2021b.
- Tackling instance-dependent label noise via a universal probabilistic model. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10183–10191, 2021c.
- To smooth or not? when label smoothing meets noisy labels. In International Conference on Machine Learning. PMLR, 2022a.
- Learning with noisy labels revisited: A study using real-world human annotations. In International Conference on Learning Representations, 2022b.
- Fair classification with instance-dependent label noise. In Conference on Causal Learning and Reasoning, pages 927–943. PMLR, 2022.
- Learning to purify noisy labels via meta soft label corrector. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10388–10396, 2021.
- Are anchor points really indispensable in label-noise learning? In Advances in neural information processing systems, pages 6838–6849, 2019.
- Part-dependent label noise: Towards instance-dependent label noise. Advances in Neural Information Processing Systems, 33, 2020.
- Sample selection with uncertainty of losses for learning with noisy labels. In International Conference on Learning Representations, 2022.
- Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2691–2699, 2015.
- Algorithmic fairness in computational medicine. EBioMedicine, 84:104250, 2022.
- L_dmi: An information-theoretic noise-robust loss function. Advances in Neural Information Processing Systems, 32, 2019.
- Searching to exploit memorization effect in learning with noisy labels. In International Conference on Machine Learning, pages 10789–10798. PMLR, 2020a.
- Dual t: Reducing estimation error for transition matrix in label-noise learning. Advances in Neural Information Processing Systems, 33, 2020b.
- Data and model bias in artificial intelligence for healthcare applications in new zealand. 2023.
- Blood pressure variability and the risk of dementia: a nationwide cohort study. Hypertension, 75(4):982–990, 2020.
- Self-paced robust learning for leveraging clean labels in noisy data. In AAAI, pages 6853–6860, 2020.
- Meta label correction for noisy label learning. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021.
- Error-bounded correction of noisy labels. In International Conference on Machine Learning, pages 11447–11457. PMLR, 2020.
- Detecting corrupted labels without training a model to predict. In International Conference on Machine Learning, pages 27412–27427. PMLR, 2022a.
- Beyond images: Label noise transition matrix estimation for tasks with lower-quality features. In International Conference on Machine Learning. PMLR, 2022b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.