Instance-Dependent Sampling Methods

Updated 22 February 2026

Instance-dependent sampling methods are dynamic strategies that tailor sampling probabilities to the specific features, structure, or difficulty profile of each data instance.
They improve statistical efficiency and reduce estimator variance by leveraging partial observations, model feedback, and adaptive importance techniques.
These methods are foundational in machine learning, reinforcement learning, and adaptive data acquisition, offering rigorous, instance-specific performance guarantees.

Instance-dependent sampling methods are a class of stochastic sampling techniques where the selection and/or weighting of samples is dynamically tailored to the specific features, structure, or difficulty profile of each data instance, task subproblem, or underlying data distribution. Unlike fixed (instance-independent) strategies, instance-dependent samplers seek to adaptively optimize statistical efficiency, estimator variance, or model-training dynamics by leveraging feedback from partially observed data, model predictions, or latent contextual properties. These techniques are foundational in areas such as efficient data summarization, learning under label noise, reinforcement learning, adaptive importance sampling, and object detection, and are characterized by rigorous instance-adaptive performance guarantees and estimator design.

1. Fundamental Principles of Instance-Dependent Sampling

Instance-dependent sampling contrasts with classical random or importance sampling by explicitly adapting acquisition probabilities or estimator construction on a per-instance or per-query basis. This adaptation can exploit partial observations (e.g., upper or lower bounds on hidden values as in multi-instance sampling), local difficulty (e.g., instance-dependent label noise), or dynamic model feedback (e.g., sample sieve or hard-negative mining). The canonical goals are to achieve optimality with respect to each individual problem instance, reduce variance in estimation, or improve data or computation efficiency in large or noisy regimes.

Two key axes of instance-dependence appear in the literature:

Sampling Distribution Adaptation: The probability law from which samples are drawn is iteratively tuned based on instance-specific information. For example, adaptive importance sampling methods use observed weights or variance gradients to move the proposal distribution closer to the instance's optimal sampler (Ortiz et al., 2013).
Estimator Design via Partial or Structured Observations: The estimator exploits all available partial information revealed by the sampling outcome on each instance, as in multi-instance sampling where partial sampling outcomes yield set-valued constraints on hidden data (Cohen et al., 2011).

Instance-dependent methods intentionally move beyond worst-case analysis, focusing on maximizing efficiency or estimator properties for each realized configuration, not just on average.

2. Multi-Instance and Partial Information Sampling

A highly developed example of instance-dependent methodology is in the setting where data are collected over multiple instances per key (e.g., across time, sensors, logs). The work of Cohen & Kaplan defines a rigorous framework for Pareto-optimal unbiased estimators over arbitrary sampling outcomes that may provide only partial information for a given multi-dimensional key (Cohen et al., 2011).

Model: For r instances, each key h is associated with a vector $v(h) = (v_1(h), \dots, v_r(h))$ . Sampling proceeds independently per instance, and the observed outcome S(h) reveals which coordinates are observed and what partial constraints (e.g. $v_i(h) \leq T_i$ in weighted schemes) apply.
Estimation: Rather than discarding incomplete instances, estimators are expressly designed to be:

Unbiased, ensuring $E[\phi(S) \mid v] = f(v)$ for target function $f$ .
Pareto-optimal in variance: no unbiased estimator has lower variance everywhere.
Monotone in revealed information—more information never increases the estimator.

Methodology: These objectives are met by order-based estimator assignments and, if needed, quadratic programming to ensure nonnegativity. Specific constructions (e.g., for max, OR, range queries) demonstrate substantial variance improvement over fixed-inclusion classical estimators.

Empirical results show that leveraging all revealed partial constraints for each instance can reduce sample size by up to 50% for the same coefficient of variation in distinct-count queries, and achieve 2-3x variance reductions for max-dominance queries (Cohen et al., 2011).

3. Instance-Optimal Adaptive Algorithms

Recent work has formalized instance-optimality in sampling and estimation, particularly in I/O-efficient data access and sequential estimation (Narayanan et al., 2024):

Instance-Optimality: An algorithm is instance-optimal if, for every input instance, its expected sampling cost matches that of any other (order-oblivious) adaptive algorithm up to a universal constant.
Algorithm Design: A two-phase adaptive sampling algorithm estimates distributional variance on-the-fly and determines, per instance, when enough data have been collected to guarantee the required estimation error with high probability. The sample complexity explicitly adapts to instance-specific variance $\sigma^2$ (for means) or instance gap parameters (quantiles, histogram modes).
Theoretical Matching Bounds: Lower bounds match these adaptive algorithms, establishing that $\Theta(1/\epsilon + \sigma^2/\epsilon^2)$ samples are both necessary and sufficient for mean estimation to precision $\epsilon$ on any given instance, with direct extensions for quantiles, histograms, and mixture models (Narayanan et al., 2024).

This framework allows rigorous, instance-by-instance exploitation of "easier" data configurations, automatically reducing computation where signal concentration or low-variance regimes make less sampling sufficient.

4. Instance-Dependent Sampling in Machine Learning

Numerous recent machine learning developments have leveraged instance-dependent sampling for training in challenging or weakly supervised setups:

Label Noise Robustness: CORES $^2$ (Confidence REgularized Sample Sieve) progressively sieves training samples with instance-dependent label noise, filtering out high-loss (unconfident) points using an adaptive, per-sample threshold derived from a confidence-regularized objective (Cheng et al., 2020). This mechanism provably guarantees high-precision separation of clean and noisy data without explicit estimation of noise rates.
Object Detection: In weakly supervised detection, OPIS (Online Progressive Instance-balanced Sampling) combines IoU-balanced negative mining and classifier-score-based reweighting to mitigate the dominance of negative instances and mine progressively harder negatives (Chen et al., 2022). IQDet introduces a novel quality-distribution estimation via per-instance GMMs, sampling training points proportionally to the localized expected IoU or quality, minimizing the impact of priors or poorly aligned heuristics (Ma et al., 2021). In 3D point cloud segmentation, ISBNet uses instance-aware farthest point sampling to guarantee each object instance is seeded, integrating both semantic and geometric cues for candidate selection (Ngo et al., 2023).

These methods empirically and theoretically outperform uniform or heuristic sampling, providing gains in mean AP, training stability, and noise robustness.

5. Adaptive Importance Sampling and Sequential Methods

Adaptive importance sampling methodologies systematically update the sampling distribution online to approach the instance's optimal proposal distribution (Ortiz et al., 2013). The formalism centers around minimizing variance or divergence to the optimal measure $q^*(x) \propto f(x) p_0(x)$ relevant for the current integral or graphical model evidence being estimated.

Algorithms: Stochastic gradient methods update parameters $\theta$ of the proposal $q(x;\theta)$ to directly minimize estimator variance or KL/L2 objectives relative to the evolving empirical estimate of $q^*$ . Iterates remain unbiased, and convergence (with standard conditions) is to a locally optimal sampler for the target instance.
Empirical Effects: In structured estimation tasks (e.g., influence diagram action evaluation), these adaptive schemes dramatically lower mean-squared error and sampling variance compared to any instance-independent (fixed) proposal.

Instance-adaptive updating is critical in high-dimensional, highly variable, or nonstationary contexts where fixed sampling cannot exploit localized structure or integrand importance.

6. Reinforcement Learning and Active Data Acquisition

Instance-dependent sample complexity analysis is now central in reinforcement learning (RL). The BPI-UCRL algorithm for PAC RL provides the first instance-dependent sample-complexity bound for optimistic exploration, incorporating minimal visitation probability $p_h^{\min}(s,a)$ and a refined conditional return gap $\widetilde\Delta_h(s,a)$ that account for the stochastic and structural idiosyncrasies of each MDP instance (Tirinzoni et al., 2022). Empirically, this delivers strictly tighter identification time for easy MDPs or those with deterministic transitions.

The instance-dependent analysis here reveals that naively re-purposing regret-based exploration schemes for PAC identification can be exponentially suboptimal for some problem instances, justifying the need for per-instance-optimized exploration.

7. Comparative Analysis and Empirical Impact

Across domains, the consistent empirical finding is that instance-dependent sampling methods:

Exploit the "easiness" or structural information of specific instances, reducing unnecessary sample acquisition.
Achieve statistically optimal or near-optimal variance, estimation error, or training accuracy on a per-instance basis.
Substantially outperform any fixed-rate or instance-agnostic heuristics in real-world data regimes characterized by heavy-tailed, clustered, or noisy distributions.

These results establish instance-dependent sampling as central to best-in-class estimation and learning in large-scale, partially observed, or adversarial environments. Prominent algorithms include Pareto-optimal estimators for multi-instance queries (Cohen et al., 2011), two-phase adaptive block samplers (Narayanan et al., 2024), adaptive importance sampling (Ortiz et al., 2013), sample sieves for label-noise (Cheng et al., 2020), and instance-aware mini-batch samplers in detection and segmentation (Ma et al., 2021, Ngo et al., 2023, Chen et al., 2022).

References

"Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information" (Cohen et al., 2011)
"Instance-Optimality in I/O-Efficient Sampling and Sequential Estimation" (Narayanan et al., 2024)
"Adaptive Importance Sampling for Estimation in Structured Domains" (Ortiz et al., 2013)
"Online progressive instance-balanced sampling for weakly supervised object detection" (Chen et al., 2022)
"IQDet: Instance-wise Quality Distribution Sampling for Object Detection" (Ma et al., 2021)
"ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling..." (Ngo et al., 2023)
"Learning with Instance-Dependent Label Noise: A Sample Sieve Approach" (Cheng et al., 2020)
"Optimistic PAC Reinforcement Learning: the Instance-Dependent View" (Tirinzoni et al., 2022)