Reservoir-Sampling Distribution Estimation

Updated 10 February 2026

Reservoir-sampling-based distribution estimation is a framework that uses scalable, unbiased sampling to estimate distributions in high-dimensional or streaming-data scenarios.
Techniques such as ReSWD and varoptₖ integrate importance weighting and variance-optimality to efficiently estimate sliced Wasserstein distances and subset-sums.
The approach is applied in machine learning, computer vision, and network analysis, demonstrating empirical improvements in error reduction and computational speed.

Reservoir-sampling-based distribution estimation refers to a family of techniques that leverage the statistical properties of reservoir sampling—classically developed for scalable, unbiased sampling from data streams—in order to construct distribution estimators with provable variance guarantees and optimality properties. These methods are especially relevant in high-dimensional or streaming-data contexts, where memory and computation constraints preclude exact inference over the full data. Recent research has extended the classical reservoir sampling framework using importance weighting, variance-optimality criteria, and integration with measures such as sliced Wasserstein distances, enabling robust and scalable solutions to distribution matching and subset-sum estimation problems in statistics, machine learning, computer vision, and graphics.

1. Core Principles of Reservoir Sampling

Reservoir sampling is designed to maintain a sample of size $k$ from a potentially unbounded stream of data, such that every element thus far seen has a specified inclusion probability in the reservoir. For unweighted streams, classical one-pass algorithms (such as Vitter’s and Efraimidis–Spirakis methods) ensure uniform inclusion probability. For weighted streams, each item is assigned a nonnegative weight $w_i$ , and inclusion probabilities must be proportional to $w_i$ .

Weighted reservoir sampling generalizes the selection process such that the sampled reservoir constitutes a valid base for unbiased estimation. For each candidate element $\theta$ (e.g., a projection direction in distribution estimation), a random key $k = u^{1/w(\theta)}$ is generated, where $u$ is drawn uniformly from $(0,1)$ . The reservoir then consists of the $k$ elements with the smallest keys, guaranteeing that each candidate survives in the reservoir with a probability proportional to $w(\theta)$ (Boss et al., 1 Oct 2025).

2. Sliced Wasserstein Distance Estimation via Reservoir Sampling

Sliced Wasserstein Distance (SWD), defined for probability measures $\mu,\nu$ on $w_i$ 0 as

$w_i$ 1

where $w_i$ 2 denotes the 1D Wasserstein distance of the projected distributions, is a scalable proxy for high-dimensional Wasserstein metrics. Monte Carlo (MC) estimators approach this integral by averaging over $w_i$ 3 random projections $w_i$ 4: $w_i$ 5 Despite unbiasedness, such MC estimators exhibit variance $w_i$ 6, which may remain prohibitive in optimization or learning contexts.

The Reservoir SWD (ReSWD) estimator (Boss et al., 1 Oct 2025) integrates weighted reservoir sampling into the SWD estimation process. At each iteration, a reservoir $w_i$ 7 of $w_i$ 8 projection directions is constructed using weighted sampling, with $w_i$ 9 set to the current 1D Wasserstein cost $w_i$ 0. Self-normalized importance weights are computed to yield the estimator: $w_i$ 1 with $w_i$ 2, where $w_i$ 3 is the marginal inclusion probability under reservoir sampling.

3. Variance-Optimal Reservoir Sampling in Subset-Sum Estimation

Variance-optimal reservoir sampling, as formalized in the varopt $w_i$ 4 scheme (0803.0473), targets the problem of maintaining a reservoir of $w_i$ 5 weighted items to enable unbiased estimation of the total weight $w_i$ 6 for any subset $w_i$ 7 of items. The algorithm maintains adjusted weights $w_i$ 8 such that the Horvitz–Thompson estimator over any $w_i$ 9,

$\theta$ 0

yields $\theta$ 1. The scheme is characterized by:

Maintenance of a “threshold” $\theta$ 2 determined by

$\theta$ 3

Eviction and adjustment procedures guaranteeing that $\theta$ 4 at all times.
Zero total sum variance: $\theta$ 5.
Optimal minimization of

$\theta$ 6

across all subset sizes $\theta$ 7 among all possible schemes with $\theta$ 8 samples.

The algorithm is efficient, offering $\theta$ 9 per-element update complexity and supporting merge operations for distributed or parallel data streams (0803.0473).

4. Unbiasedness, Variance Reduction, and Theoretical Guarantees

Reservoir-sampling-based estimators are provably unbiased. For SWD estimation, self-normalized importance sampling ensures

$k = u^{1/w(\theta)}$ 0

while the empirical variance is reduced relative to plain MC estimation—empirically by up to 20–30% for a fixed number of projections (Boss et al., 1 Oct 2025).

In subset-sum estimation, varopt $k = u^{1/w(\theta)}$ 1 yields strictly minimal average variance $k = u^{1/w(\theta)}$ 2 for all $k = u^{1/w(\theta)}$ 3 and supports tight worst-case bounds, such as

$k = u^{1/w(\theta)}$ 4

where $k = u^{1/w(\theta)}$ 5. Notably, covariance terms between adjusted weights are identically zero.

A key feature is composability in distributed contexts: varopt $k = u^{1/w(\theta)}$ 6 reservoirs can be merged (using adjusted weights and threshold recomputation) to yield the same statistical guarantees as if the complete data stream had been processed sequentially (0803.0473).

5. Algorithms: Pseudocode Structure and Computational Complexity

The ReSWD update algorithm at optimization step $k = u^{1/w(\theta)}$ 7 involves:

(Optional) Time-decay of old keys: $k = u^{1/w(\theta)}$ 8.
Drawing $k = u^{1/w(\theta)}$ 9 new directions $u$ 0.
Computing costs $u$ 1 and keys $u$ 2 for the union of old and new directions.
Retaining $u$ 3 directions with smallest keys.
Computing inclusion probabilities $u$ 4 and self-normalized weights $u$ 5.
Performing an effective sample size (ESS) check to trigger reservoir resets if necessary.

Complexity per update is $u$ 6 for data of size $u$ 7 and $u$ 8 for key sorting. In practice, setting $u$ 9 (the MC sample size) ensures similar asymptotic costs as plain SWD, with modest overhead for structural maintenance (Boss et al., 1 Oct 2025).

In varopt $(0,1)$ 0, reservoir updates and merges are executed in $(0,1)$ 1, with amortized constant-time operations possible under specific implementations (0803.0473).

6. Empirical Performance and Applications

Empirical evaluation in (Boss et al., 1 Oct 2025) demonstrates that ReSWD achieves the lowest mean $(0,1)$ 2 error in synthetic 3D-to-3D distribution matching at marginal computational cost (e.g., $(0,1)$ 3 $(0,1)$ 4 error at $(0,1)$ 5ms/step vs $(0,1)$ 6– $(0,1)$ 7 for baselines at $(0,1)$ 8ms/step). In vision and graphics tasks such as color correction and diffusion guidance, ReSWD delivers measurable improvements in error metrics (e.g., RMSE reduced from $(0,1)$ 9, PSNR increased from $k$ 0) and efficiency (guidance for SD3.5 Large and Turbo yields $k$ 1– $k$ 2 speedup, $k$ 3– $k$ 4 lower $k$ 5 color-distance) (Boss et al., 1 Oct 2025).

In streaming data applications, varopt $k$ 6 is widely used for network traffic analysis, streaming subset-sum estimation, and distributed statistics, offering strict statistical optimality and efficient support for parallel and mergeable computation (0803.0473).

7. Practical Considerations and Limitations

Reservoir size ( $k$ 7 in ReSWD, $k$ 8 in varopt $k$ 9) and the number of new candidates per iteration ( $w(\theta)$ 0) govern the balance between memory usage and adaptivity. Ablation studies in (Boss et al., 1 Oct 2025) suggest $w(\theta)$ 1, $w(\theta)$ 2 delivers robust performance. Decay parameters allow adaptation to nonstationary data; effective sample size reset heuristics prevent estimator degeneration by enforcing periodic full redraws.

ReSWD’s variance advantages decrease when $w(\theta)$ 3 (costs $w(\theta)$ 4 nearly uniform); in this regime, ReSWD reverts to standard SWD. For extension beyond linear projections (e.g., learned kernels), no improvement was observed over random projections, attributed to the overwhelming search space dimensionality (Boss et al., 1 Oct 2025).

A plausible implication is that while reservoir-sampling-based estimation offers theoretically optimal, unbiased, and adaptive estimators for both streaming subset-sums and distribution matching objectives, its practical impact depends on task-specific parameterization and the structure of the data or distributions involved.

Markdown Report Issue Upgrade to Chat

References (2)

ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction (2025)

Stream sampling for variance-optimal estimation of subset sums (2008)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reservoir-Sampling-Based Distribution Estimation.