Papers
Topics
Authors
Recent
Search
2000 character limit reached

Which distribution were you sampled from? Towards a more tangible conception of data

Published 24 Jul 2024 in cs.LG | (2407.17395v3)

Abstract: Machine Learning research, as most of Statistics, heavily relies on the concept of a data-generating probability distribution. The standard presumption is that since data points are `sampled from' such a distribution, one can learn from observed data about this distribution and, thus, predict future data points which, it is presumed, are also drawn from it. Drawing on scholarship across disciplines, we here argue that this framework is not always a good model. Not only do such true probability distributions not exist; the framework can also be misleading and obscure both the choices made and the goals pursued in machine learning practice. We suggest an alternative framework that focuses on finite populations rather than abstract distributions; while classical learning theory can be left almost unchanged, it opens new opportunities, especially to model sampling. We compile these considerations into five reasons for modelling machine learning -- in some settings -- with finite populations rather than generative distributions, both to be more faithful to practice and to provide novel theoretical insights.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 32 likes about this paper.