Papers
Topics
Authors
Recent
Search
2000 character limit reached

Tournament Leave-pair-out Cross-validation for Receiver Operating Characteristic (ROC) Analysis

Published 29 Jan 2018 in stat.ML | (1801.09386v2)

Abstract: Receiver operating characteristic (ROC) analysis is widely used for evaluating diagnostic systems. Recent studies have shown that estimating an area under ROC curve (AUC) with standard cross-validation methods suffers from a large bias. The leave-pair-out (LPO) cross-validation has been shown to correct this bias. However, while LPO produces an almost unbiased estimate of AUC, it does not provide a ranking of the data needed for plotting and analyzing the ROC curve. In this study, we propose a new method called tournament leave-pair-out (TLPO) cross-validation. This method extends LPO by creating a tournament from pair comparisons to produce a ranking for the data. TLPO preserves the advantage of LPO for estimating AUC, while it also allows performing ROC analyses. We have shown using both synthetic and real world data that TLPO is as reliable as LPO for AUC estimation, and confirmed the bias in leave-one-out cross-validation on low-dimensional data. As a case study on ROC analysis, we also evaluate how reliably sensitivity and specificity can be estimated from TLPO ROC curves.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Swets JA. Measuring the accuracy of diagnostic systems. Science 1988; 240(4857): 1285–1293.
  2. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143(1): 29–36.
  3. Fawcett T. An introduction to roc analysis. Pattern recognition letters 2006; 27(8): 861–874.
  4. Small-sample precision of ROC-related estimates. Bioinformatics 2010; 26(6): 822–830.
  5. Parker BJ, Gunter S and Bedo J. Stratification bias in low signal microarray studies. BMC Bioinformatics 2007; 8: 326.
  6. Apples-to-apples in cross-validation studies: Pitfalls in classifier performance measurement. SIGKDD Explorations 2010; 12(1): 49–57.
  7. An experimental comparison of cross-validation techniques for estimating the area under the ROC curve. Computational Statistics & Data Analysis 2011; 55(4): 1828–1844. 10.1016/j.csda.2010.11.018.
  8. On cross-validation and stacking: Building seemingly predictive models on random data. ACM SIGKDD Explorations Newsletter 2011; 12(2): 11–15.
  9. Correcting for optimistic prediction in small data sets. American journal of epidemiology 2014; 180(3): 318–324.
  10. On the method of paired comparisons. Biometrika 1940; 31(3/4): 324–345.
  11. The theory of round robin tournaments. The American Mathematical Monthly 1966; 73(3): 231–246.
  12. Coppersmith D, Fleischer LK and Rurda A. Ordering by weighted number of wins gives a good ranking for weighted tournaments. ACM Trans Algorithms 2010; 6(3): 55:1–55:13.
  13. The use of receiver operating characteristic curves in biomedical informatics. Journal of biomedical informatics 2005; 38(5): 404–415.
  14. Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of mathematical psychology 1975; 12(4): 387–415.
  15. Harrell FE, Lee KL and Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in medicine 1996; 15(4): 361–387.
  16. Efron B. Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American statistical association 1983; 78(382): 316–331.
  17. Improvements on cross-validation: the 632+ bootstrap method. Journal of the American Statistical Association 1997; 92(438): 548–560.
  18. Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2. IJCAI’95, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., pp. 1137–1143.
  19. Robust reductions from ranking to classification. Machine learning 2008; 72(1-2): 139–153.
  20. Gass S. Tournaments, transitivity and pairwise comparison matrices. Journal of the Operational Research Society 1998; : 616–624.
  21. Is cross-validation valid for small-sample microarray classification? Bioinformatics 2004; 20(3): 374–380.
  22. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970; 12(1): 55–67.
  23. Regularized least-squares classification. Nato Science Series Sub Series III Computer and Systems Sciences 2003; 190: 131–154.
  24. Rlscore: Regularized least-squares learners. Journal of Machine Learning Research 2016; 17(221): 1–5.
  25. Discriminatory analysis-nonparametric discrimination: consistency properties. Technical report, California Univ Berkeley, 1951.
  26. Nearest neighbor pattern classification. IEEE transactions on information theory 1967; 13(1): 21–27.
  27. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 2011; 12: 2825–2830.
  28. Prebiopsy multiparametric 3t prostate mri in patients with elevated psa, normal digital rectal examination, and no previous biopsy. Journal of Magnetic Resonance Imaging 2015; 41(5): 1394–1404.
  29. Prospective evaluation of planar bone scintigraphy, spect, spect/ct, 18f-naf pet/ct and whole body 1.5 t mri, including dwi, for the detection of bone metastases in high risk breast and prostate cancer patients: Skeleta clinical trial. Acta oncologica 2016; 55(1): 59–67.
  30. Relaxation along fictitious field, diffusion-weighted imaging, and t2 mapping of prostate cancer: Prediction of cancer aggressiveness. Magnetic resonance in medicine 2016; 75(5): 2130–2140.
  31. Fitting methods for intravoxel incoherent motion imaging of prostate cancer on region of interest level: Repeatability and gleason score prediction. Magnetic resonance in medicine 2017; 77(3): 1249–1264.
  32. Evaluation of different mathematical models for diffusion-weighted imaging of normal prostate and prostate cancer using high b-values: A repeatability study. Magnetic resonance in medicine 2015; 73(5): 1988–1998.
  33. Diffusion-weighted imaging of prostate cancer: effect of b-value distribution on repeatability and cancer characterization. Magnetic resonance imaging 2015; 33(10): 1212–1218.
  34. Mathematical models for diffusion-weighted imaging of prostate cancer using b values up to 2000 s/mm2: Correlation with gleason score and repeatability of region of interest analysis. Magnetic resonance in medicine 2015; 74(4): 1116–1124.
  35. Optimization of b-value distribution for four mathematical models of prostate cancer diffusion-weighted imaging using b values up to 2000 s/mm2: Simulation and repeatability study. Magnetic resonance in medicine 2015; 73(5): 1954–1969.
  36. Diffusion weighted imaging of prostate cancer: Prediction of cancer using texture features from parametric maps of the monoexponential and kurtosis functions. In Image Processing Theory Tools and Applications (IPTA), 2016 6th International Conference on. IEEE, pp. 1–6.
  37. Prostate cancer detection with multi-parametric mri: Logistic regression analysis of quantitative t2, diffusion-weighted imaging, and dynamic contrast-enhanced mri. Journal of magnetic resonance imaging 2009; 30(2): 327–334.
  38. Variable ranking with pca: Finding multiparametric mr imaging markers for prostate cancer diagnosis and grading. In Prostate Cancer Imaging. Springer, pp. 146–157.
  39. Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: An update. Academic Radiology 1997; 4(1): 49–58.
  40. The analysis of placement values for evaluating discriminatory measures. Biometrics ; 60(2): 528–535.
  41. DeLong ER, DeLong DM and Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; : 837–845.
  42. Comparison of non-parametric confidence intervals for the area under the ROC curve of a continuous-scale diagnostic test. Statistical Methods in Medical Research 2008; 17(2): 207–221.
  43. Feng D, Cortese G and Baumgartner R. A comparison of confidence/credible interval methods for the area under the roc curve for continuous diagnostic tests with small sample size. Statistical methods in medical research 2015; : 0962280215602040.
  44. No unbiased estimator of the variance of k-fold cross-validation. Journal of Machince Learning Research 2004; 5.
Citations (20)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.