Disentangle causes of rejection in goodness-of-fit tests for power-law frequency distributions

Determine whether rejections (small p-values) produced by maximum-likelihood goodness-of-fit tests for power-law frequency distributions are caused by deviations from the assumed parametric law P(x|θ) (hypothesis H1) or by violations of independence among observations (hypothesis H2). Specifically, for datasets analyzed under the independent-and-identically-distributed framework, ascertain whether the observed rejection stems from an incorrect functional form for the distribution (e.g., a power-law model p(x)=Cx^{-γ}) or from temporal/spatial correlations that invalidate the independence assumption used by the test.

Background

The monograph discusses standard likelihood-based goodness-of-fit tests for frequency distributions, often applied to evaluate power-law behavior in complex systems. These tests typically assume independent and identically distributed (iid) samples, blending two hypotheses: H1 (the data follow a specific parametric distribution such as a power law) and H2 (observations are independent).

In complex systems, independence is frequently violated due to temporal or spatial correlations (e.g., in earthquake sequences or text data). Consequently, a statistical rejection may reflect either a mismatch in the chosen parametric form (H1) or the failure of the independence assumption (H2). The authors highlight that current practices commonly conflate these effects, leaving ambiguity about the true cause of rejection. Resolving this ambiguity is essential for correctly interpreting reported violations of power-law behavior and for designing appropriate tests or models that account for dependence in the data.

References

When a statistical test leads to a rejection (small p-value), as used in the recent claims of violation of power laws, it rejects the compound hypothesis (H1+H2). It is not clear if it is due to a systematic deviation of the parametric-form of the law (H1), or, instead, due to the well-known fact that observations are not independent (H2).

Statistical Laws in Complex Systems  (2407.19874 - Altmann, 2024) in Section 3.4.1 (Independence hypothesis) in Chapter 3 (From data to laws)