Tuning Random Generators: Property-Based Testing as Probabilistic Programming

Published 20 Aug 2025 in cs.PL and cs.SE | (2508.14394v1)

Abstract: Property-based testing validates software against an executable specification by evaluating it on randomly generated inputs. The standard way that PBT users generate test inputs is via generators that describe how to sample test inputs through random choices. To achieve a good distribution over test inputs, users must tune their generators, i.e., decide on the weights of these individual random choices. Unfortunately, it is very difficult to understand how to choose individual generator weights in order to achieve a desired distribution, so today this process is tedious and limits the distributions that can be practically achieved. In this paper, we develop techniques for the automatic and offline tuning of generators. Given a generator with undetermined symbolic weights and an objective function, our approach automatically learns values for these weights that optimize for the objective. We describe useful objective functions that allow users to (1) target desired distributions and (2) improve the diversity and validity of their test cases. We have implemented our approach in a novel discrete probabilistic programming system, Loaded Dice, that supports differentiation and parameter learning, and use it as a language for generators. We empirically demonstrate that our approach is effective at optimizing generator distributions according to the specified objective functions. We also perform a thorough evaluation on PBT benchmarks, demonstrating that, when automatically tuned for diversity and validity, the generators exhibit a 3.1-7.4x speedup in bug finding.

Abstract PDF Upgrade to Chat

Summary

The paper introduces Loaded Dice, a system that uses probabilistic and differentiable programming to automatically tune random generators in property-based testing.
The approach leverages binary decision diagrams and gradient descent to optimize generator weights, achieving bug-finding speedups between 3.1x and 7.4x.
Objective functions based on Kullback-Leibler divergence and specification entropy ensure diverse and valid test cases in the tuning process.

Tuning Random Generators: Property-Based Testing as Probabilistic Programming

Introduction

The paper "Tuning Random Generators: Property-Based Testing as Probabilistic Programming" (2508.14394) explores techniques for automatically tuning random input generators used in Property-Based Testing (PBT). The goal is to optimize the distribution of test cases for better bug-finding efficiency. The authors introduce a system called Loaded Dice, which leverages probabilistic programming and differentiable programming to perform automatic tuning of generator weights based on specified objective functions.

Loaded Dice: A Probabilistic Programming System

Loaded Dice is a novel discrete probabilistic programming system that extends Dice, a probabilistic logic programming language. It supports differentiation and parameter learning, enabling the automatic tuning of PBT generators. The system allows developers to specify symbolic weights in generators, which Loaded Dice optimizes according to user-defined objectives.

Implementation Details

Syntax and Semantics: Loaded Dice uses a first-order functional programming style with extensions for probabilistic programming constructs. It includes support for symbolic weights that represent random choices in generators.
Differentiable Programming: The system compiles generators to binary decision diagrams (BDDs) to perform efficient probabilistic inference and compute gradients. These gradients are used to adjust weights through gradient descent, optimizing generator distributions.

Objective Functions for Generator Tuning

The paper describes several objective functions that can guide the optimization of generator weights:

Target Distribution: Developers can specify a desired distribution over some feature of the generated test cases. The system uses Kullback-Leibler divergence to minimize the distance between the generator's distribution and the target distribution.
Diversity and Validity: To maximize the diversity and validity of test cases, the paper introduces specification entropy as an objective. This metric combines entropy with notions of validity, ensuring that generated test cases are both diverse and valid according to a given specification.

Techniques for Effective Tuning

The authors highlight techniques to construct more tunable generators:

Parameterizing Dependencies: Introduce dependencies in weights based on the execution context, such as function parameters or previous random choices. This allows more expressive control over generator distributions.
Frontloading Choices: Structure generators to make early probabilistic choices that affect subsequent sampling, enabling correlated random choices and better distribution control.

Performance and Results

The paper presents empirical results showing the effectiveness of Loaded Dice in tuning generators:

Tuned generators exhibit a significant speedup in bug-finding, ranging from 3.1x to 7.4x faster than untuned generators.
Evaluations on benchmarks for binary search trees (BST), red-black trees (RBT), and simply-typed lambda calculus (STLC) demonstrate the system's ability to optimize for specified distributions and achieve greater diversity and validity in generated test cases.

Conclusion

The authors conclude that automatic tuning of PBT generators using Loaded Dice leads to improved bug-finding efficiency. The approach allows developers to declaratively specify generator distribution goals, providing better control over test input generation without manual tuning. Future work includes extending the framework to support more complex generator constructs and adaptive sizing strategies.

Implications and Future Work

The introduction of Loaded Dice and the techniques for automatic generator tuning represent a significant development in property-based testing. The ability to specify and optimize test case distributions has implications for software quality assurance, potentially leading to more robust and efficient automated testing processes. The potential for extending these methods to more complex and dynamic test scenarios offers a promising direction for future research and development in the field.