Papers
Topics
Authors
Recent
Search
2000 character limit reached

Living Synthetic Benchmarks: A Neutral and Cumulative Framework for Simulation Studies

Published 22 Oct 2025 in stat.ME | (2510.19489v1)

Abstract: Simulation studies are widely used to evaluate statistical methods. However, new methods are often introduced and evaluated using data-generating mechanisms (DGMs) devised by the same authors. This coupling creates misaligned incentives, e.g., the need to demonstrate the superiority of new methods, potentially compromising the neutrality of simulation studies. Furthermore, results of simulation studies are often difficult to compare due to differences in DGMs, competing methods, and performance measures. This fragmentation can lead to conflicting conclusions, hinder methodological progress, and delay the adoption of effective methods. To address these challenges, we introduce the concept of living synthetic benchmarks. The key idea is to disentangle method and simulation study development and continuously update the benchmark whenever a new DGM, method, or performance measure becomes available. This separation benefits the neutrality of method evaluation, emphasizes the development of both methods and DGMs, and enables systematic comparisons. In this paper, we outline a blueprint for building and maintaining such benchmarks, discuss the technical and organizational challenges of implementation, and demonstrate feasibility with a prototype benchmark for publication bias adjustment methods. We conclude that living synthetic benchmarks have the potential to foster neutral, reproducible, and cumulative evaluation of methods, benefiting both method developers and users.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 6 tweets with 6 likes about this paper.