Samplable PAC Learning & Evasive Sets

Updated 8 December 2025

Samplable PAC Learning is a variant of the PAC framework that restricts attention to efficiently samplable data distributions, reducing theoretical sample complexity.
The approach uses explicit evasive sets to achieve exponential separations from standard PAC learning even when the VC-dimension is high.
Integrating cryptographic assumptions and bounded adversaries, the framework enhances both computational and online learning efficiency.

Samplable PAC learning is a refinement of the classical Probably Approximately Correct (PAC) learning framework, in which the requirement to learn under all possible data distributions is relaxed to learning under distributions that are efficiently samplable. This modification, first explicitly formalized by Blum, Furst, Kearns, and Lipton (1993), has significant implications for both the statistical and computational complexity of learning, leading to new separations and open questions regarding the true nature of learnability when data-generation processes are algorithmically constrained (Blanc et al., 1 Dec 2025).

1. Formal Definitions and Framework

In the standard PAC (Valiant, 1984), a concept class $C \subseteq \{c: X \rightarrow \{0,1\}\}$ is PAC-learned if a learner $A$ , given $m$ labeled examples drawn i.i.d. from any distribution $D$ over the instance space $X$ and any target $c \in C$ , outputs a hypothesis $h$ such that

$\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.$

for all $c \in C$ and all $D$ .

The samplable PAC model retains this structure but only requires $A$ 0 to succeed when $A$ 1 is efficiently samplable: that is, there exists a Boolean circuit $A$ 2 of size at most $A$ 3 such that $A$ 4 induces $A$ 5 for some $A$ 6. The learner must succeed for all samplable $A$ 7 of size at most $A$ 8. The formal error metric is $A$ 9 (Blanc et al., 1 Dec 2025).

2. Statistical Separations from Standard PAC Learning

A principal finding is the existence of concept classes where samplable PAC learning is exponentially more powerful than standard PAC. Specifically, there exist classes $m$ 0 over $m$ 1 such that:

$m$ 2 implies that standard PAC sample complexity is $m$ 3.
The same class is learnable in samplable PAC with polynomial sample complexity.

This is realized through the construction of an $m$ 4-evasive set $m$ 5, defined so every size- $m$ 6 samplable distribution $m$ 7 “misses” $m$ 8 outside of its $m$ 9 heaviest points, meaning $D$ 0 for some $D$ 1, $D$ 2. The corresponding concept class $D$ 3 is defined by $D$ 4 if $D$ 5, $D$ 6 otherwise $D$ 7. While its VC-dimension remains large, the evasiveness property ensures that, against samplable $D$ 8, memorization-based learners cover nearly all of $D$ 9 using a small sample $X$ 0 (Blanc et al., 1 Dec 2025).

3. Explicit Evasive Sets and Efficient Learnability

Explicit evasive sets are sets $X$ 1 characterized by:

Membership in $X$ 2 is decidable by a circuit of size $X$ 3.
$X$ 4 is super-polynomial in $X$ 5.
For all $X$ 6, every size- $X$ 7 sampler $X$ 8 $X$ 9-misses $c \in C$ 0 for some constant $c \in C$ 1.

Such sets, if constructed, provide both hardness and efficient recognizability, crucial for sharp separations between standard and samplable PAC learnability. Their existence is connected to core complexity-theoretic hypotheses: if $c \in C$ 2, existential sampling makes $c \in C$ 3 non-evasive. In the random oracle model, explicit $c \in C$ 4 can be produced that are evasive for all polynomial-size oracle samplers, with $c \in C$ 5 (Blanc et al., 1 Dec 2025).

4. Computational Separations and Reductions

The study extends beyond sample complexity to computational complexity:

Let $c \in C$ 6 be a pseudorandom function family secure against $c \in C$ 7 circuits; define $c \in C$ 8, where $c \in C$ 9 if $h$ 0, $h$ 1 otherwise.
Learning $h$ 2 with respect to the uniform distribution on $h$ 3 requires breaking the PRF, which is infeasible in $h$ 4 time (under standard cryptographic assumptions).
However, by the evasiveness of $h$ 5, $h$ 6 becomes learnable in $h$ 7 time for all size- $h$ 8 samplable distributions, as almost all of the probability mass falls on a small, easily memorizable subset.

This separation persists relative to a random oracle, demonstrating unconditional distinctions between standard and samplable PAC in oracle models (Blanc et al., 1 Dec 2025).

5. Online Learning and Adversarial Efficiency

The samplable PAC principle translates to online learning models. In the classic online framework (Littlestone 1988), an adversary provides instances; with no computational restriction, the adversary can force exponentially many mistakes (corresponding to the Littlestone dimension $h$ 9 for some $\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.$ 0). If the adversary is limited to producing instances by polynomial-size circuits (efficient adversary), for the same concept classes the best online learner's number of mistakes drops to $\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.$ 1. A “default-zero” memorization learner achieves low mistake bounds because an efficient adversary can only produce a bounded number of distinct “hard” points in an evasive set $\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.$ 2 (Blanc et al., 1 Dec 2025).

Computationally, the same construction methods (PRFs + explicit $\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.$ 3) yield classes where online learning is hard against unbounded adversaries, yet efficient against bounded adversaries.

6. Connections to Classical PAC Learning Bounds

Classical results in PAC learning show that for a concept class $\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.$ 4 of finite VC-dimension $\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.$ 5, the sample complexity in the realizable PAC setting is $\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.$ 6 (Hanneke, 2015). These optimal bounds hold for learning under arbitrary distributions. The samplable PAC findings indicate that, if distributions are restricted to samplable ones, these bounds may not fully capture the true sample or computational complexity: samplable PAC learning can circumvent high sample complexity by exploiting evasiveness in the domain, even when the VC-dimension is exponential. This underscores a fundamental distinction between distributional assumptions in learning theory (Blanc et al., 1 Dec 2025).

7. Open Problems and Future Directions

A central open problem is to characterize samplable PAC sample complexity in terms of both the complexities of the concept class and the samplability of the distribution, paralleling the role of VC-dimension in standard PAC settings. Current characterizations do not yield a combinatorial or analytical measure analogous to VC for samplable PAC. Further lines of investigation include developing measures of distributional complexity beyond samplability and constructing explicit evasive sets unconditionally (that is, outside random oracle or cryptographic assumptions), with the aim of better understanding and harnessing the expanded power of efficient learning under realistic data-generation constraints (Blanc et al., 1 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (2)

Samplability makes learning easier (2025)

The Optimal Sample Complexity of PAC Learning (2015)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Samplable PAC Learning.