Papers
Topics
Authors
Recent
Search
2000 character limit reached

Reliability and Interpretability in Science and Deep Learning

Published 14 Jan 2024 in cs.AI, cs.LG, and physics.hist-ph | (2401.07359v3)

Abstract: In recent years, the question of the reliability of Machine Learning (ML) methods has acquired significant importance, and the analysis of the associated uncertainties has motivated a growing amount of research. However, most of these studies have applied standard error analysis to ML models, and in particular Deep Neural Network (DNN) models, which represent a rather significant departure from standard scientific modelling. It is therefore necessary to integrate the standard error analysis with a deeper epistemological analysis of the possible differences between DNN models and standard scientific modelling and the possible implications of these differences in the assessment of reliability. This article offers several contributions. First, it emphasises the ubiquitous role of model assumptions (both in ML and traditional Science) against the illusion of theory-free science. Secondly, model assumptions are analysed from the point of view of their (epistemic) complexity, which is shown to be language-independent. It is argued that the high epistemic complexity of DNN models hinders the estimate of their reliability and also their prospect of long-term progress. Some potential ways forward are suggested. Thirdly, this article identifies the close relation between a model's epistemic complexity and its interpretability, as introduced in the context of responsible AI. This clarifies in which sense, and to what extent, the lack of understanding of a model (black-box problem) impacts its interpretability in a way that is independent of individual skills. It also clarifies how interpretability is a precondition for assessing the reliability of any model, which cannot be based on statistical analysis alone. This article focuses on the comparison between traditional scientific models and DNN models. But, Random Forest and Logistic Regression models are also briefly considered.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (149)
  1. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion 76, 243–297.
  2. Akaike, H. (1973). Information Theory as an Extension of the Maximum Likelihood Principle. In B. Petrov and F. Csaki (Eds.), Second International Symposium on Information Theory, pp.  267–281. Budapest: Akademiai Kiado.
  3. Flow-based generative models for Markov chain Monte Carlo in lattice field theory. Phys. Rev. D 100, 034515.
  4. The end of theory: The data deluge makes the scientific method obsolete. https://www.wired.com/2008/06/pb-theory/. [Online; posted 23-June-2008].
  5. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion 58, 82–115.
  6. Fraud analytics using descriptive, predictive, and social network techniques: a guide to data science for fraud detection. John Wiley & Sons.
  7. Baker, A. (2004). Simplicity. In E. N. Zalta (Ed.), Stanford Encyclopedia of Philosophy (Winter 2004 Edition ed.). Stanford University.
  8. Bayesian inference in high-dimensional models.
  9. Barenblatt, G. I. (1996). Scaling, self-similarity, and intermediate asymptotics: dimensional analysis and intermediate asymptotics. Number 14 in Cambridge Texts in Applied Mathematics. Cambridge University Press.
  10. Barnes, E. C. (2022). Prediction versus Accommodation. In E. N. Zalta and U. Nodelman (Eds.), The Stanford Encyclopedia of Philosophy (Winter 2022 ed.). Metaphysics Research Lab, Stanford University.
  11. Barnett, L. (1950). The Meaning of Einstein’s New Theory – Interview of A. Einstein. Life Magazine 28, 22.
  12. Baron, S. (2023). Explainable AI and Causal Understanding: Counterfactual Approaches Considered. Minds and Machines 33, 347–377.
  13. Bartlett, P. (1998). The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory 44(2), 525–536.
  14. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3(Nov), 463–482.
  15. Beisbart, C. (2021). Opacity thought through: on the intransparency of computer simulations. Synthese 199(3-4), 11643–11666.
  16. Philosophy of science at sea: Clarifying the interpretability of machine learning. Philosophy Compass 17(6), e12830.
  17. Bishop, C. M. (2006). Pattern Recognition and Machine Learning, Volume 4 of Information science and statistics. Springer.
  18. Weight uncertainty in neural network. In International conference on machine learning, pp. 1613–1622. PMLR.
  19. Boge, F. J. (2022). Two dimensions of opacity and the deep learning predicament. Minds and Machines 32(1), 43–75.
  20. Introduction to Statistics and Data Analysis for Physicists; 3rd revised. Verlag Deutsches Elektronen-Synchrotron 488, 978–3.
  21. Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901.
  22. Buchholz, O. (2023). The deep neural network approach to the reference class problem. Synthese 201(3), 111.
  23. Buijsman, S. (2022). Defining explanation and explanatory depth in XAI. Minds and Machines 32(3), 563–584.
  24. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In S. A. Friedler and C. Wilson (Eds.), Proceedings of the 1st Conference on Fairness, Accountability and Transparency, Volume 81 of Proceedings of Machine Learning Research, pp.  77–91. PMLR.
  25. Burnham, K. P. and D. R. Anderson (2004). Multimodel Inference: Understanding AIC and BIC in Model Selection. Sociological Methods & Research 33(2), 261–304.
  26. Buttazzo, G. (2022). Can We Trust AI-Powered Real-Time Embedded Systems? In M. Bertogna, F. Terraneo, and F. Reghenzani (Eds.), Third Workshop on Next Generation Real-Time Embedded Systems (NG-RES 2022), Volume 98 of Open Access Series in Informatics (OASIcs), Dagstuhl, Germany, pp.  1:1–1:14. Schloss Dagstuhl – Leibniz-Zentrum für Informatik.
  27. Carey, R. (2023). Bertrand Russell: Metaphysics. The Internet Encyclopedia of Philosophy ISSN-2161-0002, 1.
  28. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods, pp.  3–14. New York, NY, USA: Association for Computing Machinery.
  29. Chaitin, G. (1969). On the Length of Programs for Computing Finite Binary Sequences: Statistical Considerations. Journal of the ACM 16, 145–159.
  30. Chaitin, G. J. (1975). Randomness and mathematical proof. Scientific American 232(5), 47–53.
  31. Decentring the discoverer: how AI helps us rethink scientific discovery. Synthese 200(6), 463.
  32. Falsificationism and statistical learning theory: Comparing the Popper and Vapnik-Chervonenkis dimensions. Journal for General Philosophy of Science 40, 51–58.
  33. Crupi, V. (2021). Confirmation. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Spring 2021 ed.). Metaphysics Research Lab, Stanford University.
  34. de Forcrand, P. (2009). Simulating QCD at finite density. PoS LAT2009, 010.
  35. Derkse, W. (1992). On Simplicity and Elegance: An Essay in Intellectual History. Eburon.
  36. The Epistemological Foundations of Data Science: A Critical Review. Synthese 200(6), 1–27.
  37. Publication bias and clinical trials. Control Clin Trials 8(4), 343–353.
  38. Towards a rigorous science of interpretable machine learning. https://arxiv.org/abs/1702.08608.
  39. Duede, E. (2022). Instruments, agents, and artificial intelligence. Synthese 200(6), 491.
  40. Duede, E. (2023). Deep Learning Opacity in Scientific Discovery. Philosophy of Science 90(5), 1089–1099.
  41. Duhem, P. M. M. (1954). The Aim and Structure of Physical Theory. Princeton: Princeton University Press.
  42. Dziugaite, G. K. and D. M. Roy (2017). Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, pp.  1–14.
  43. Eitel-Porter, R. (2021). Beyond the promise: implementing ethical AI. AI and Ethics 1(1), 73–80.
  44. Emsley, J. (2011). Nature’s building blocks. Oxford: Oxford University Press.
  45. European Commission (2023). EU Artificial Intelligence Act. https://artificialintelligenceact.eu/the-act/.
  46. Fitzpatrick, S. (2014). Simplicity in the Philosophy of Science. The Internet Encyclopedia of Philosophy ISSN-2161-0002, 1.
  47. Floridi, L. (2004). On the logical unsolvability of the gettier problem. Synthese 142, 61–79.
  48. Gabaix, X. (2009). Power laws in economics and finance. Annu. Rev. Econ. 1(1), 255–294.
  49. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059. PMLR.
  50. Galilei, G. (1962). Dialogue Concerning the Two Chief World Systems. Berkeley: University of California Press.
  51. A Survey of Uncertainty in Deep Neural Networks. https://arxiv.org/abs/2107.03342.
  52. Bayesian Data Analysis, Third Edition. Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis.
  53. Ghosal, S. and A. Van der Vaart (2017). Fundamentals of nonparametric Bayesian inference, Volume 44. Cambridge University Press.
  54. Giraud, C. (2021). Introduction to high-dimensional statistics. CRC Press.
  55. Understanding the difficulty of training deep feedforward neural networks. In Y. W. Teh and M. Titterington (Eds.), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Volume 9 of Proceedings of Machine Learning Research, Chia Laguna Resort, Sardinia, Italy, pp.  249–256. PMLR.
  56. Goldbloom, A. (2015). What algorithms are most successful on Kaggle?
  57. Deep Learning. MIT Press. http://www.deeplearningbook.org.
  58. Stellar Spectral Classification. Princeton Series in Astrophysics. Princeton University Press.
  59. On Calibration of Modern Neural Networks. https://arxiv.org/abs/1706.04599.
  60. Hall, P. (2018). On the art and science of machine learning explanations. https://arxiv.org/abs/1810.02909.
  61. Hanson, N. (1965). Patterns of Discovery: An Inquiry Into the Conceptual Foundations of Science. Cambridge University Press.
  62. Reliable reasoning: Induction and statistical learning theory. MIT press.
  63. Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-) robustness. In Artificial Intelligence–Limits and Prospects, pp.  43–82. transcript Verlag.
  64. The elements of statistical learning: data mining, inference and prediction (2 ed.). Springer.
  65. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778. IEEE.
  66. Why RELU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  41–50.
  67. Natural Adversarial Examples. https://arxiv.org/abs/1907.07174.
  68. High-Level Expert Group on AI (2019, April). Ethics guidelines for trustworthy AI. Report, European Commission, Brussels.
  69. Multilayer Feedforward Networks Are Universal Approximators. Neural Netw. 2(5), 359–366.
  70. Large-scale learning with svm and convolutional for generic object categorization. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Volume 1, pp.  284–291. IEEE.
  71. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning 110(3), 457–506.
  72. Hutter, M. (2005). Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Berlin: Springer. 300 pages, http://www.hutter1.net/ai/uaibook.htm.
  73. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Advances in Neural Information Processing Systems, Volume 31. Curran Associates, Inc.
  74. Johnstone, I. M. (2010). High dimensional bernstein-von mises: simple examples. Institute of Mathematical Statistics collections 6, 87.
  75. Hands-on Bayesian Neural Networks - a Tutorial for Deep Learning Users. CoRR abs/2007.06823, 1–20.
  76. Highly accurate protein structure prediction with AlphaFold. Nature 596(7873), 583–589.
  77. Philosophy of Science in Light of Artificial Intelligence, Synthese Collections. Springer.
  78. Kelly, K. T. (2009). Ockham’s Razor, Truth, and Information. In J. van Behthem and P. Adriaans (Eds.), Handbook of the Philosophy of Information. Dordrecht: Elsevier.
  79. Kemeny, J. G. (1953). The use of simplicity in induction. The Philosophical Review 62, 391.
  80. Examples are not enough, learn to criticize! Criticism for Interpretability. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Advances in Neural Information Processing Systems, Volume 29, pp. 1–9. Curran Associates, Inc.
  81. Kolmogorov, A. N. (1965). Three Approaches to the Quantitative Definition of Information. Problems Inform. Transmission 1, 1–7.
  82. The CIFAR-10 dataset. http://www.cs.toronto.edu/~kriz/cifar.html.
  83. Kuhn, T. S. (1962). The Structure of Scientific Revolutions. Chicago: University of Chicago Press.
  84. Kvasz, L. (2008). Patterns of change. Science Networks. Historical Studies. Dordrecht: Springer.
  85. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Advances in Neural Information Processing Systems, Volume 30, pp.  1–12. Curran Associates, Inc.
  86. Lavoisier, A. (1862). Rèflexions sur le phlogistique. In Oeuvres, Volume 2, pp.  623–655. Paris: Imprimerie Impériale.
  87. Deep Learning. Nature 521(7553), 436–444.
  88. Leonelli, S. (2020). Scientific Research and Big Data. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Summer 2020 ed.). Metaphysics Research Lab, Stanford University.
  89. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks 6(6), 861–867.
  90. Lewis, D. K. (1973). Counterfactuals. Cambridge, MA, USA: Blackwell.
  91. An Introduction to Kolmogorov Complexity and Its Applications (4th ed.). New York: Springer.
  92. Lipton, Z. C. (2018). The mythos of model interpretability. Queue 16(3), 31–57.
  93. Sparse Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  94. Uncertainty-aware deep learning in healthcare: a scoping review. PLOS digital health 1(8), e0000085.
  95. Lutz, S. (2014). What’s Right with a Syntactic Approach to Theories and Models? Erkenntnis 79(8), 1475–1492.
  96. Lutz, S. (2017). What Was the Syntax-Semantics Debate in the Philosophy of Science About? Philosophy and Phenomenological Research 95(2), 319–352.
  97. Lutz, S. (2023). Interpretation. In The SAGE Encyclopedia of Theory in Science, Technology, Engineering, and Mathematics, Sage Reference, pp.  407–411. SAGE Publications, Inc.
  98. Mach, E. (1882). Über die Ökonomische Natur der Physikalischen Forschung. Almanach der Wiener Akademie, 179.
  99. Past the tipping point? AI and Ethics 1, 1–3.
  100. A Practical Bayesian Framework for Backpropagation Networks. Neural Computation 4(3), 448–472.
  101. Predictive Uncertainty Estimation via Prior Networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Advances in Neural Information Processing Systems, Volume 31, pp.  7047–7058. Curran Associates, Inc.
  102. Marshall, S. L. and J. G. Blencoe (2005). Generalized least-squares fit of multiequation models. American Journal of Physics 73(1), 69–82.
  103. Estimating Neural Network’s Performance with Bootstrap: A Tutorial. Machine Learning and Knowledge Extraction 3(2), 357–373.
  104. Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence 267, 1–38.
  105. A review on weight initialization strategies for neural networks. Artificial intelligence review 55(1), 291–322.
  106. Newton, I. (1964). The Mathematical Principles of Natural Philosophy (Principia Mathematica). New York: Citadel Press.
  107. Nolan, D. (1997). Quantitative Parsimony. British Journal for the Philosophy of Science 48(3), 329–343.
  108. Examining the use of neural networks for feature extraction: A comparative analysis using deep learning, support vector machines, and k-nearest neighbor classifiers. https://arxiv.org/abs/1805.02294.
  109. Poincaré, H. (1902). La Science et l’Hypothèse. Paris: Ernest Flammarion Ed.
  110. Popper, K. (1959). The Logic of Scientific Discovery. New York: Basic Books.
  111. Quine, W. (1975). On Empirically Equivalent Systems of the World. Erkenntnis 9, 313.
  112. Quine, W. V. O. (1963). On simple theories of a complex world. Synthese 15(1), 103–106.
  113. Direct uncertainty prediction for medical second opinions. In International Conference on Machine Learning, pp. 5281–5290. PMLR.
  114. Do CIFAR-10 Classifiers Generalize to CIFAR-10? CoRR abs/1806.00451, 1–25.
  115. Rodríguez, C. C. (2005). The ABC of Model Selection: AIC, BIC and the New CIC. Bayesian Inference and Maximum Entropy Methods in Science and Engineering 803, 80–87.
  116. Roy, V. (2020). Convergence Diagnostics for Markov Chain Monte Carlo. Annual Review of Statistics and Its Application 7(1), 387–412.
  117. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115(3), 211–252.
  118. Schwarz, G. (1978). Estimating the Dimension of a Model. Annals of Statistics 4, 461–464.
  119. Scorzato, L. (2013). On the role of simplicity in science. Synthese 190, 2867–2895.
  120. Scorzato, L. (2016). A simple model of scientific progress. In L. Felline, F. Paoli, and E. Rossanese (Eds.), New Developments in Logic and Philosophy of Science, Volume 3 of SILFS. College Publications.
  121. Evidential Deep Learning to Quantify Classification Uncertainty. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Advances in Neural Information Processing Systems, Volume 31, pp.  3183–3193. Curran Associates, Inc.
  122. Shorten, C. and T. M. Khoshgoftaar (2019). A survey on image data augmentation for deep learning. Journal of big data 6(1), 1–48.
  123. Sims, C. (2010). Understanding non-bayesians. http://sims.princeton.edu/yftp/UndrstndgNnBsns/GewekeBookChpter.pdf. Department of Economics, Princeton University.
  124. Solomonoff, R. J. (1964). A Formal Theory of Inductive Inference. Parts I and II. Inform. Contr. 7, 1–22, 224–254.
  125. Stanford, K. (2021). Underdetermination of Scientific Theory. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Winter 2021 ed.). Metaphysics Research Lab, Stanford University.
  126. Swinburne, R. (1997). Simplicity as Evidence of Truth. Milwaukee: Marquette University Press.
  127. Intriguing properties of neural networks. In International Conference on Learning Representations.
  128. Machine understanding and deep learning representation. Synthese 201(2), 51.
  129. Uncertainty Prediction for Machine Learning Models of Material Properties. ACS Omega 6(48), 32431–32440.
  130. Titterington, D. (2004). Bayesian methods for neural networks and related models. Statistical Science 6(1), 128–139.
  131. Van der Vaart, A. W. (2000). Asymptotic statistics, Volume 3. Cambridge university press.
  132. Vapnik, V. (1999). The nature of statistical learning theory. Springer science & business media.
  133. Vapnik, V. N. and A. Y. Chervonenkis (1971). On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. Theory of Probability & Its Applications 16(2), 264–280.
  134. Attention is All you Need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Advances in Neural Information Processing Systems, Volume 30. Curran Associates, Inc.
  135. Votsis, I. (2016). Philosophy of Science and Information. In L. Floridi (Ed.), The Routledge Handbook of Philosophy of Information. Routledge.
  136. Introduction: Novel Predictions. Studies in History and Philosophy of Science Part A 45, 43–45.
  137. Walsh, D. (1979). Occam’s razor: A principle of intellectual elegance. American Philosophical Quarterly 16(3), 241–244.
  138. Wang, H. and D.-Y. Yeung (2016). Towards Bayesian Deep Learning: A Framework and Some Existing Methods. IEEE Transactions on Knowledge and Data Engineering 28, 3395–3408.
  139. Confidence Dimension for Deep Learning based on Hoeffding Inequality and Relative Evaluation. https://arxiv.org/abs/2203.09082.
  140. The explanation game: a formal framework for interpretable machine learning. In Ethics, Governance, and Policies in Artificial Intelligence, pp.  185–219. Springer.
  141. Time Series Data Augmentation for Deep Learning: A Survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization.
  142. BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning. In International Conference on Learning Representations.
  143. Weyl, H. (1932). The Open World: Three Lectures on the Metaphysical Implications of Science. Dwight Harrington Terry Lectures. Yale University Press.
  144. WHO (2021). International classification of diseases. https://icdcdn.who.int/icd11referenceguide. Eleventh Revision (ICD-11).
  145. Winther, R. G. (2021). The Structure of Scientific Theories. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Spring 2021 ed.). Metaphysics Research Lab, Stanford University.
  146. Woodward, J. (2005). Making things happen: A theory of causal explanation. Oxford university press.
  147. Noise or Signal: The Role of Image Backgrounds in Object Recognition. In International Conference on Learning Representations.
  148. Simplicity, Inference and Modelling: Keeping It Sophisticatedly Simple. Cambridge University Press.
  149. Zenil, H. and S. Bringsjord (Eds.) (2020). Philosophy and Epistemology of Deep Learning, Volume 4 of Philosophies Special Issues, Basel, Switzerland. Multidisciplinary Digital Publishing Institute.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.