Computing Approximate Statistical Discrepancy
Abstract: Consider a geometric range space $(X,\c{A})$ where each data point $x \in X$ has two or more values (say $r(x)$ and $b(x)$). Also consider a function $\Phi(A)$ defined on any subset $A \in (X,\c{A})$ on the sum of values in that range e.g., $r_A = \sum_{x \in A} r(x)$ and $b_A = \sum_{x \in A} b(x)$. The $\Phi$-maximum range is $A* = \arg \max_{A \in (X,\c{A})} \Phi(A)$. Our goal is to find some $\hat{A}$ such that $|\Phi(\hat{A}) - \Phi(A*)| \leq \varepsilon.$ We develop algorithms for this problem for range spaces with bounded VC-dimension, as well as significant improvements for those defined by balls, halfspaces, and axis-aligned rectangles. This problem has many applications in many areas including discrepancy evaluation, classification, and spatial scan statistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.