How accurate are Bayes factor-based null hypothesis tests? A simulation study

Published 12 Jun 2024 in stat.ME | (2406.08022v2)

Abstract: Bayes factor null hypothesis tests provide a viable alternative to frequentist measures of evidence quantification. Bayes factors for realistic data sets in areas like psychology cannot be calculated exactly and require numerical approximations to complex integrals. Crucially, the accuracy of these approximations, i.e., whether an approximate Bayes factor corresponds to the exact Bayes factor, is unknown, and may depend on data, prior, and likelihood. We have recently developed a novel statistical procedure, namely marginal simulation-based calibration (SBC) for Bayes factors, to test whether the computed Bayes factors for a given analysis are accurate. Here, we use marginal SBC for Bayes factors and calibration plots to test for some common cognitive designs, whether Bayes factors are calculated accurately. We use the bridgesampling/brms packages in R. We run analyses for three commonly used designs in psychology and psycholinguistics: (a) a design with random effects for subjects only, (b) a Latin square design with crossed random effects for subjects and items, but a single fixed-factor, and (c) a Latin square 2x2 design with crossed random effects for subjects and items. We find that Bayes factor estimates turn out accurate in cases when the bridgesampling algorithm does not issue a warning message, but can be biased and liberal when a warning message is shown. These results support the use of brms/bridgesampling for null hypothesis Bayes factor tests in commonly used factorial designs. They also suggest that when a warning message is issued, Bayes factor results should not be trusted. The results show that it is practical to check whether Bayes factors are computed correctly.