Samplable PAC Learning & Evasive Sets
- Samplable PAC Learning is a variant of the PAC framework that restricts attention to efficiently samplable data distributions, reducing theoretical sample complexity.
- The approach uses explicit evasive sets to achieve exponential separations from standard PAC learning even when the VC-dimension is high.
- Integrating cryptographic assumptions and bounded adversaries, the framework enhances both computational and online learning efficiency.
Samplable PAC learning is a refinement of the classical Probably Approximately Correct (PAC) learning framework, in which the requirement to learn under all possible data distributions is relaxed to learning under distributions that are efficiently samplable. This modification, first explicitly formalized by Blum, Furst, Kearns, and Lipton (1993), has significant implications for both the statistical and computational complexity of learning, leading to new separations and open questions regarding the true nature of learnability when data-generation processes are algorithmically constrained (Blanc et al., 1 Dec 2025).
1. Formal Definitions and Framework
In the standard PAC (Valiant, 1984), a concept class is PAC-learned if a learner %%%%1%%%%, given labeled examples drawn i.i.d. from any distribution over the instance space and any target , outputs a hypothesis such that
for all and all .
The samplable PAC model retains this structure but only requires to succeed when is efficiently samplable: that is, there exists a Boolean circuit of size at most such that induces for some . The learner must succeed for all samplable of size at most . The formal error metric is (Blanc et al., 1 Dec 2025).
2. Statistical Separations from Standard PAC Learning
A principal finding is the existence of concept classes where samplable PAC learning is exponentially more powerful than standard PAC. Specifically, there exist classes over such that:
- implies that standard PAC sample complexity is .
- The same class is learnable in samplable PAC with polynomial sample complexity.
This is realized through the construction of an -evasive set , defined so every size- samplable distribution “misses” outside of its heaviest points, meaning for some , . The corresponding concept class is defined by if , $0$ otherwise. While its VC-dimension remains large, the evasiveness property ensures that, against samplable , memorization-based learners cover nearly all of using a small sample (Blanc et al., 1 Dec 2025).
3. Explicit Evasive Sets and Efficient Learnability
Explicit evasive sets are sets characterized by:
- Membership in is decidable by a circuit of size .
- is super-polynomial in .
- For all , every size- sampler -misses for some constant .
Such sets, if constructed, provide both hardness and efficient recognizability, crucial for sharp separations between standard and samplable PAC learnability. Their existence is connected to core complexity-theoretic hypotheses: if , existential sampling makes non-evasive. In the random oracle model, explicit can be produced that are evasive for all polynomial-size oracle samplers, with (Blanc et al., 1 Dec 2025).
4. Computational Separations and Reductions
The study extends beyond sample complexity to computational complexity:
- Let be a pseudorandom function family secure against circuits; define , where if , $0$ otherwise.
- Learning with respect to the uniform distribution on requires breaking the PRF, which is infeasible in time (under standard cryptographic assumptions).
- However, by the evasiveness of , becomes learnable in time for all size- samplable distributions, as almost all of the probability mass falls on a small, easily memorizable subset.
This separation persists relative to a random oracle, demonstrating unconditional distinctions between standard and samplable PAC in oracle models (Blanc et al., 1 Dec 2025).
5. Online Learning and Adversarial Efficiency
The samplable PAC principle translates to online learning models. In the classic online framework (Littlestone 1988), an adversary provides instances; with no computational restriction, the adversary can force exponentially many mistakes (corresponding to the Littlestone dimension for some ). If the adversary is limited to producing instances by polynomial-size circuits (efficient adversary), for the same concept classes the best online learner's number of mistakes drops to . A “default-zero” memorization learner achieves low mistake bounds because an efficient adversary can only produce a bounded number of distinct “hard” points in an evasive set (Blanc et al., 1 Dec 2025).
Computationally, the same construction methods (PRFs + explicit ) yield classes where online learning is hard against unbounded adversaries, yet efficient against bounded adversaries.
6. Connections to Classical PAC Learning Bounds
Classical results in PAC learning show that for a concept class of finite VC-dimension , the sample complexity in the realizable PAC setting is (Hanneke, 2015). These optimal bounds hold for learning under arbitrary distributions. The samplable PAC findings indicate that, if distributions are restricted to samplable ones, these bounds may not fully capture the true sample or computational complexity: samplable PAC learning can circumvent high sample complexity by exploiting evasiveness in the domain, even when the VC-dimension is exponential. This underscores a fundamental distinction between distributional assumptions in learning theory (Blanc et al., 1 Dec 2025).
7. Open Problems and Future Directions
A central open problem is to characterize samplable PAC sample complexity in terms of both the complexities of the concept class and the samplability of the distribution, paralleling the role of VC-dimension in standard PAC settings. Current characterizations do not yield a combinatorial or analytical measure analogous to VC for samplable PAC. Further lines of investigation include developing measures of distributional complexity beyond samplability and constructing explicit evasive sets unconditionally (that is, outside random oracle or cryptographic assumptions), with the aim of better understanding and harnessing the expanded power of efficient learning under realistic data-generation constraints (Blanc et al., 1 Dec 2025).