Papers
Topics
Authors
Recent
Search
2000 character limit reached

Reservoir-Sampling Distribution Estimation

Updated 10 February 2026
  • Reservoir-sampling-based distribution estimation is a framework that uses scalable, unbiased sampling to estimate distributions in high-dimensional or streaming-data scenarios.
  • Techniques such as ReSWD and varoptₖ integrate importance weighting and variance-optimality to efficiently estimate sliced Wasserstein distances and subset-sums.
  • The approach is applied in machine learning, computer vision, and network analysis, demonstrating empirical improvements in error reduction and computational speed.

Reservoir-sampling-based distribution estimation refers to a family of techniques that leverage the statistical properties of reservoir sampling—classically developed for scalable, unbiased sampling from data streams—in order to construct distribution estimators with provable variance guarantees and optimality properties. These methods are especially relevant in high-dimensional or streaming-data contexts, where memory and computation constraints preclude exact inference over the full data. Recent research has extended the classical reservoir sampling framework using importance weighting, variance-optimality criteria, and integration with measures such as sliced Wasserstein distances, enabling robust and scalable solutions to distribution matching and subset-sum estimation problems in statistics, machine learning, computer vision, and graphics.

1. Core Principles of Reservoir Sampling

Reservoir sampling is designed to maintain a sample of size kk from a potentially unbounded stream of data, such that every element thus far seen has a specified inclusion probability in the reservoir. For unweighted streams, classical one-pass algorithms (such as Vitter’s and Efraimidis–Spirakis methods) ensure uniform inclusion probability. For weighted streams, each item is assigned a nonnegative weight wiw_i, and inclusion probabilities must be proportional to wiw_i.

Weighted reservoir sampling generalizes the selection process such that the sampled reservoir constitutes a valid base for unbiased estimation. For each candidate element θ\theta (e.g., a projection direction in distribution estimation), a random key k=u1/w(θ)k = u^{1/w(\theta)} is generated, where uu is drawn uniformly from (0,1)(0,1). The reservoir then consists of the kk elements with the smallest keys, guaranteeing that each candidate survives in the reservoir with a probability proportional to w(θ)w(\theta) (Boss et al., 1 Oct 2025).

2. Sliced Wasserstein Distance Estimation via Reservoir Sampling

Sliced Wasserstein Distance (SWD), defined for probability measures μ,ν\mu,\nu on Rd\mathbb{R}^d as

SWDp(μ,ν)=(θSd1Wpp(θ,μ,θ,ν)dθ)1/p\operatorname{SWD}_p(\mu,\nu) = \left( \int_{\theta\in S^{d-1}} W_p^p(\langle \theta,\mu\rangle,\langle \theta,\nu\rangle) \, d\theta \right)^{1/p}

where WpW_p denotes the 1D Wasserstein distance of the projected distributions, is a scalable proxy for high-dimensional Wasserstein metrics. Monte Carlo (MC) estimators approach this integral by averaging over LL random projections θiUniform(Sd1)\theta_i \sim \mathrm{Uniform}(S^{d-1}): SWDp(μ,ν)(1Li=1LWpp(πθiμ,πθiν))1/p\operatorname{SWD}_p(\mu,\nu) \approx \left( \frac{1}{L} \sum_{i=1}^{L} W_p^p(\pi_{\theta_i}\mu, \pi_{\theta_i}\nu) \right)^{1/p} Despite unbiasedness, such MC estimators exhibit variance Var[Wpp]/L\sim \operatorname{Var}[W_p^p] / L, which may remain prohibitive in optimization or learning contexts.

The Reservoir SWD (ReSWD) estimator (Boss et al., 1 Oct 2025) integrates weighted reservoir sampling into the SWD estimation process. At each iteration, a reservoir R\mathcal{R} of KK projection directions is constructed using weighted sampling, with w(θ)w(\theta) set to the current 1D Wasserstein cost D(θ)=Wp(πθμ,πθν)D(\theta) = W_p(\pi_{\theta}\mu, \pi_{\theta}\nu). Self-normalized importance weights are computed to yield the estimator: S^p(μ,ν)=i=1KwiD(θi)\widehat{S}_p(\mu,\nu) = \sum_{i=1}^K w_i \cdot D(\theta_i) with wi=[1/q(θi)]/j[1/q(θj)]w_i = [1/q(\theta_i)] / \sum_j [1/q(\theta_j)], where q(θi)D(θi)q(\theta_i) \propto D(\theta_i) is the marginal inclusion probability under reservoir sampling.

3. Variance-Optimal Reservoir Sampling in Subset-Sum Estimation

Variance-optimal reservoir sampling, as formalized in the varoptk_k scheme (0803.0473), targets the problem of maintaining a reservoir of kk weighted items to enable unbiased estimation of the total weight iMwi\sum_{i\in M} w_i for any subset MM of items. The algorithm maintains adjusted weights w^i\hat w_i such that the Horvitz–Thompson estimator over any MM,

S^M=iSMw^i,\widehat S_M = \sum_{i \in S \cap M} \hat w_i,

yields E[S^M]=SME[\widehat S_M] = S_M. The scheme is characterized by:

  • Maintenance of a “threshold” τk\tau_k determined by

jS{new}min ⁣(1,w~jτ)=k\sum_{j \in S\cup\{\text{new}\}} \min\!\Bigl(1,\frac{\tilde w_j}{\tau}\Bigr)=k

  • Eviction and adjustment procedures guaranteeing that S=k|S|=k at all times.
  • Zero total sum variance: Var[S^[n]]=0\operatorname{Var}[\widehat S_{[n]}]=0.
  • Optimal minimization of

Vm=EM=m[Var[S^M]]V_m = E_{|M|=m}[\operatorname{Var}[\widehat S_M]]

across all subset sizes mm among all possible schemes with kk samples.

The algorithm is efficient, offering O(logk)O(\log k) per-element update complexity and supporting merge operations for distributed or parallel data streams (0803.0473).

4. Unbiasedness, Variance Reduction, and Theoretical Guarantees

Reservoir-sampling-based estimators are provably unbiased. For SWD estimation, self-normalized importance sampling ensures

E[S^p]=EθUniform[Wp(πθμ,πθν)]E[\widehat S_p] = E_{\theta\sim\mathrm{Uniform}}[W_p(\pi_{\theta} \mu, \pi_{\theta} \nu)]

while the empirical variance is reduced relative to plain MC estimation—empirically by up to 20–30% for a fixed number of projections (Boss et al., 1 Oct 2025).

In subset-sum estimation, varoptk_k yields strictly minimal average variance VmV_m for all mm and supports tight worst-case bounds, such as

Var[w^i]wiW/k,Var[S^M](W/k)SM\operatorname{Var}[\hat w_i] \leq w_i W / k, \quad \operatorname{Var}[\widehat S_M] \leq (W/k) S_M

where W=iwiW = \sum_i w_i. Notably, covariance terms between adjusted weights are identically zero.

A key feature is composability in distributed contexts: varoptk_k reservoirs can be merged (using adjusted weights and threshold recomputation) to yield the same statistical guarantees as if the complete data stream had been processed sequentially (0803.0473).

5. Algorithms: Pseudocode Structure and Computational Complexity

The ReSWD update algorithm at optimization step tt involves:

  • (Optional) Time-decay of old keys: kikiexp(age/τ)k_i \gets k_i \cdot \exp(-\text{age}/\tau).
  • Drawing MM new directions θ\theta.
  • Computing costs D(θ)D(\theta) and keys k(θ)k(\theta) for the union of old and new directions.
  • Retaining KK directions with smallest keys.
  • Computing inclusion probabilities qiq_i and self-normalized weights wiw_i.
  • Performing an effective sample size (ESS) check to trigger reservoir resets if necessary.

Complexity per update is O((K+M)nlogn)O((K+M) n \log n) for data of size nn and O((K+M)logK)O((K+M)\log K) for key sorting. In practice, setting K+MLK+M\approx L (the MC sample size) ensures similar asymptotic costs as plain SWD, with modest overhead for structural maintenance (Boss et al., 1 Oct 2025).

In varoptk_k, reservoir updates and merges are executed in O(logk)O(\log k), with amortized constant-time operations possible under specific implementations (0803.0473).

6. Empirical Performance and Applications

Empirical evaluation in (Boss et al., 1 Oct 2025) demonstrates that ReSWD achieves the lowest mean W1W_1 error in synthetic 3D-to-3D distribution matching at marginal computational cost (e.g., 0.622×1030.622 \times 10^3 W1W_1 error at 1.91.9\,ms/step vs $0.670$–$0.733$ for baselines at 1.031.03\,ms/step). In vision and graphics tasks such as color correction and diffusion guidance, ReSWD delivers measurable improvements in error metrics (e.g., RMSE reduced from 0.340.310.34\rightarrow0.31, PSNR increased from 24.3024.6424.30\rightarrow24.64) and efficiency (guidance for SD3.5 Large and Turbo yields $2$–4×4\times speedup, $30$–45%45\% lower W2W_2 color-distance) (Boss et al., 1 Oct 2025).

In streaming data applications, varoptk_k is widely used for network traffic analysis, streaming subset-sum estimation, and distributed statistics, offering strict statistical optimality and efficient support for parallel and mergeable computation (0803.0473).

7. Practical Considerations and Limitations

Reservoir size (KK in ReSWD, kk in varoptk_k) and the number of new candidates per iteration (MM) govern the balance between memory usage and adaptivity. Ablation studies in (Boss et al., 1 Oct 2025) suggest K=64K=64, M=8M=8 delivers robust performance. Decay parameters allow adaptation to nonstationary data; effective sample size reset heuristics prevent estimator degeneration by enforcing periodic full redraws.

ReSWD’s variance advantages decrease when μν\mu\approx\nu (costs D(θ)D(\theta) nearly uniform); in this regime, ReSWD reverts to standard SWD. For extension beyond linear projections (e.g., learned kernels), no improvement was observed over random projections, attributed to the overwhelming search space dimensionality (Boss et al., 1 Oct 2025).

A plausible implication is that while reservoir-sampling-based estimation offers theoretically optimal, unbiased, and adaptive estimators for both streaming subset-sums and distribution matching objectives, its practical impact depends on task-specific parameterization and the structure of the data or distributions involved.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reservoir-Sampling-Based Distribution Estimation.