Using Ramsey theory to measure unavoidable spurious correlations in Big Data
Abstract: Given a dataset we quantify how many patterns must always exist in the dataset. Formally this is done through the lens of Ramsey theory of graphs, and a quantitative bound known as Goodman's theorem. Combining statistical tools with Ramsey theory of graphs gives a nuanced understanding of how far away a dataset is from random, and what qualifies as a meaningful pattern. This method is applied to a dataset of repeated voters in the 1984 US congress, to quantify how homogeneous a subset of congressional voters is. We also measure how transitive a subset of voters is. Statistical Ramsey theory is also used with global economic trading data to provide evidence that global markets are quite transitive.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.