Statistical Rejection Sampling Optimization

Updated 24 January 2026

RSO is a meta-framework that optimizes classical rejection sampling via adaptive proposals, envelope refinement, and divergence minimization to achieve efficient and robust sampling.
It unifies methods like entropy-optimal discrete sampling and adaptive envelope optimization, offering provable theoretical guarantees and improved computational efficiency.
RSO has broad applications in Bayesian inference, high-dimensional structured modeling, and machine learning, including enhanced variational inference and LLM alignment.

Statistical Rejection Sampling Optimization (RSO) encompasses a collection of algorithms and frameworks that optimize classical rejection sampling schemes for efficient, robust, and often provably optimal sampling from complex distributions. RSO unifies insights from information theory, adaptive proposal construction, hybrid variational inference, and algorithmic learning to achieve optimality in entropy, computational cost, or accuracy. Its applications range from discrete random variate generation and Bayesian variational inference to preference modeling in large-scale machine learning and high-dimensional structured modeling.

1. Fundamental Principles of Rejection Sampling Optimization

Statistical Rejection Sampling involves generating candidate samples from an easily sampled proposal distribution and accepting or rejecting them based on their importance weights with respect to the unnormalized target density. The classical acceptance probability is

$P_{\mathrm{accept}}(x) = \frac{f(x)}{M g(x)},$

where $f(x)$ is the target density (possibly unnormalized), $g(x)$ is the proposal, and $M \geq \sup_x f(x)/g(x)$ is a tight upper bound.

Optimization in the RSO paradigm aims to select $g$ and $M$ so as to maximize acceptance probability (minimize $M$ ), ensure theoretical guarantees (e.g., unbiasedness, minimal divergence), and scale efficiently in memory and computational resources. RSO extends classical rejection sampling by optimizing envelopes, adaptively refining proposals, linking to divergence minimization, or improving entropy efficiency.

2. Entropy-Optimal RSO for Discrete Distributions

A prominent RSO instantiation is the Amplified Loaded Dice Roller (ALDR) for sampling from a discrete distribution $P = (p_1, \ldots, p_n)$ with rational probabilities $p_i = a_i/m$ via an unbiased entropy source (coin flips) (Draper et al., 5 Apr 2025). The ALDR constructs a dyadic-proposal tree through a preprocessing phase:

Inputs: integer weights $(a_1, \dots, a_n)$ , total $f(x)$ 0.
Amplification: Choose $f(x)$ 1 so $f(x)$ 2, set $f(x)$ 3, define amplified weights $f(x)$ 4 ( $f(x)$ 5 for reject).
Build arrays $f(x)$ 6 (leaf counts per level) and $f(x)$ 7 (flattened leaf labels) in $f(x)$ 8 time and space.

Sampling proceeds by bitwise descent in the tree: On leaf hit, return $f(x)$ 9 if $g(x)$ 0; otherwise, restart. The expected entropy cost per sample $g(x)$ 1 satisfies

$g(x)$ 2

where $g(x)$ 3 is the Shannon entropy. This achieves strict information-theoretic optimality within $g(x)$ 4, using only $g(x)$ 5 storage and preprocessing—no prior discrete sampler achieved this (Draper et al., 5 Apr 2025).

Empirical results show ALDR outperforms the alias method in both entropy efficiency and wall-clock sampling time, especially for sparse or low-entropy distributions (Draper et al., 5 Apr 2025).

3. RSO in Monte Carlo Variational Inference

Several RSO approaches refine variational inference through a rejection sampling lens. In "Refined $g(x)$ 6-Divergence Variational Inference via Rejection Sampling" (Sharma et al., 2019), the key observation is that the worst-case density ratio, defined by the minimal $g(x)$ 7, equates to the $g(x)$ 8 Rényi divergence: $g(x)$ 9. The presented "two-stage" algorithm combines:

Stage 1: Minimize Monte Carlo estimates of $M \geq \sup_x f(x)/g(x)$ 0 for finite $M \geq \sup_x f(x)/g(x)$ 1 to optimize $M \geq \sup_x f(x)/g(x)$ 2.
Stage 2: Use the learned $M \geq \sup_x f(x)/g(x)$ 3 to perform rejection sampling with the envelope $M \geq \sup_x f(x)/g(x)$ 4, forming an improved, sample-based approximation.

Theoretical results guarantee

$M \geq \sup_x f(x)/g(x)$ 5

for all finite $M \geq \sup_x f(x)/g(x)$ 6, with strict improvement unless the rejection step approaches triviality.

A similar principle underlies "Variational Rejection Sampling" (grover et al., 2018), where a smooth threshold parameter controls the trade-off between computational cost and posterior tightness. Accepted samples from the proposal $M \geq \sup_x f(x)/g(x)$ 7 are upweighted in the ELBO by their model likelihood, leading to significant improvements in marginal likelihood estimation.

4. Adaptive and Structured Envelope Optimization

A key family of RSO methods involves piecewise or data-driven proposal refinement.

(a) Adaptive Envelopes and Piecewise Majorization

The Vertical Weighted Strips (VWS) framework (Raim et al., 2024, Raim et al., 21 Sep 2025) constructs proposals by partitioning the domain into $M \geq \sup_x f(x)/g(x)$ 8 strips and assigning each a local supremum (majorizer) and infimum (minorizer) of the weight function $M \geq \sup_x f(x)/g(x)$ 9. The finite mixture proposal

$g$ 0

delivers tunable acceptance rates, with analytic pre-sampling upper bounds: $g$ 1 Adaptive partitioning splits high-contribution strips, driving rejection probability below a user-specified target.

In the context of Gibbs sampling, self-tuned VWS maintains and refines persistent proposals for each conditional as the MCMC chain progresses, balancing refinement cost against rejection rates (Raim et al., 21 Sep 2025). In large-scale Bayesian applications such as small area estimation, self-tuned VWS dramatically improved effective sample size and eliminated autocorrelation in posterior draws.

(b) Generalized Adaptive Rejection Schemes

Beyond log-concave densities, (Martino et al., 2011) develops two adaptive envelope strategies—one piecewise and one using the ratio-of-uniforms representation—to handle multimodal and log-convex-tailed targets. Each rejected sample introduces a new support point, tightening local bounds and monotonically increasing acceptance probability.

5. RSO for Preference-Based Policy Optimization

In LLM alignment, "Statistical Rejection Sampling Improves Preference Optimization" (Liu et al., 2023) proposes RSO to bridge the sampling mismatch between target optimum and data-collecting distributions in Direct Preference Optimization (DPO) and Sequence Likelihood Calibration (SLiC). The key steps are:

Compute the closed-form optimal policy $g$ 2.
Use $g$ 3 as the proposal and accept $g$ 4 with probability $g$ 5.
Aggregate accepted samples for unbiased loss-based policy updates.

This explicitly generates on-policy preference pairs, yielding empirically higher win rates versus SFT and DPO-trained baselines on multiple LLM alignment benchmarks (Liu et al., 2023).

6. RSO in Algorithmic Optimization and Learning

RSO also describes optimization strategies outside of probabilistic inference.

(a) Random Search Optimization for Neural Nets

"RSO: A Gradient Free Sampling Based Approach For Training Deep Neural Networks" (Tripathi et al., 2020) explores a perturb-and-reject Markov chain over neural network parameters: Each weight is proposed for random perturbation, and the update is accepted only if the loss strictly decreases. Despite the absence of gradients, RSO efficiently discovers performant solutions with an order-of-magnitude fewer logical weight-updates than SGD on MNIST and CIFAR-10, though with greater per-iteration cost (Tripathi et al., 2020).

(b) OS* Algorithm for Unified Sampling and Optimization

The OS* algorithm (Dymetman et al., 2012) generalizes RSO: it iteratively maintains an upper bounding proposal (efficient for either sampling or optimization), incrementally refines it using rejected samples, and provably concentrates computational resources on high-probability regions. This joint approach to exact sampling and search exploits locally tractable bounds and A*-style search in high-dimensional discrete or graphical-model settings.

7. Theoretical Characterization and Efficiency Boundaries

Theoretical analysis in RSO benchmarks optimality in entropy, divergence reduction, and mean-squared error or variance. For example:

The minimax lower bound for adaptive rejection sampling guarantees that, absent additional structure, no method can achieve a rejection rate exceeding $g$ 6 (up to logarithmic factors) for $g$ 7 target density evaluations where $g$ 8 has Hölder regularity $g$ 9 in $M$ 0 dimensions (Achdou et al., 2018).
In variational inference, variational rejection sampling monotically tightens the ELBO and interpolates between a loose, cheap bound and exact posterior approximation at the cost of increased computation (grover et al., 2018, Sharma et al., 2019).
In the discrete entropy-optimal case, ALDR matches the Knuth-Yao lower bound within 2 bits of entropy, with no exponential scaling in proposal space (Draper et al., 5 Apr 2025).

8. Empirical Impact and Application Breadth

RSO methods have demonstrated significant improvements across domains:

In structured variate generation, ALDR achieves lower entropy cost and faster wall-clock sampling than the alias method for a broad class of discrete distributions (Draper et al., 5 Apr 2025).
Variational RSO approaches dominate adaptive- $M$ 1-divergence and classic RDVI baselines in latent-variable models and Bayesian neural networks (grover et al., 2018, Sharma et al., 2019).
Self-tuned VWS proposals in Gibbs sampling enable exact draws from nonstandard univariate conditionals—a key advance in large-scale Bayesian small-area estimation models (Raim et al., 21 Sep 2025, Raim et al., 2024).
In LLM alignment, RSO delivers on-policy data and unbiased learning, improving human preference and automatic win rates (Liu et al., 2023).

RSO thus represents a meta-framework—encompassing both principled, information-bound methods and pragmatic, adaptive engineering—for optimizing sampling, inference, and learning wherever rejection-based schemes provide a tractable, exact mechanism but require careful control of proposal design, envelope tightness, or theoretical risk.