Papers
Topics
Authors
Recent
Search
2000 character limit reached

Samplable PAC Learning & Evasive Sets

Updated 8 December 2025
  • Samplable PAC Learning is a variant of the PAC framework that restricts attention to efficiently samplable data distributions, reducing theoretical sample complexity.
  • The approach uses explicit evasive sets to achieve exponential separations from standard PAC learning even when the VC-dimension is high.
  • Integrating cryptographic assumptions and bounded adversaries, the framework enhances both computational and online learning efficiency.

Samplable PAC learning is a refinement of the classical Probably Approximately Correct (PAC) learning framework, in which the requirement to learn under all possible data distributions is relaxed to learning under distributions that are efficiently samplable. This modification, first explicitly formalized by Blum, Furst, Kearns, and Lipton (1993), has significant implications for both the statistical and computational complexity of learning, leading to new separations and open questions regarding the true nature of learnability when data-generation processes are algorithmically constrained (Blanc et al., 1 Dec 2025).

1. Formal Definitions and Framework

In the standard PAC (Valiant, 1984), a concept class C{c:X{0,1}}C \subseteq \{c: X \rightarrow \{0,1\}\} is PAC-learned if a learner %%%%1%%%%, given mm labeled examples drawn i.i.d. from any distribution DD over the instance space XX and any target cCc \in C, outputs a hypothesis hh such that

PrS(D,c)m[PrxD[h(x)c(x)]ϵ]1δ.\Pr_{S \sim (D, c)^m} [\Pr_{x \sim D}[h(x) \neq c(x)] \leq \epsilon] \geq 1 - \delta.

for all cCc \in C and all DD.

The samplable PAC model retains this structure but only requires AA to succeed when DD is efficiently samplable: that is, there exists a Boolean circuit GG of size at most ss such that G(Uniform on {0,1})G(\text{Uniform on } \{0,1\}^\ell) induces DD for some \ell. The learner must succeed for all samplable DD of size at most poly(n)\text{poly}(n). The formal error metric is errD(h)=PrxD[h(x)c(x)]err_D(h) = \Pr_{x \sim D}[h(x) \neq c(x)] (Blanc et al., 1 Dec 2025).

2. Statistical Separations from Standard PAC Learning

A principal finding is the existence of concept classes where samplable PAC learning is exponentially more powerful than standard PAC. Specifically, there exist classes CC over {0,1}n\{0,1\}^n such that:

  • VCdim(C)=2Ω(n)\text{VCdim}(C) = 2^{\Omega(n)} implies that standard PAC sample complexity is 2Ω(n)2^{\Omega(n)}.
  • The same class is learnable in samplable PAC with polynomial sample complexity.

This is realized through the construction of an (ϵ,k)(\epsilon, k)-evasive set H{0,1}nH \subset \{0,1\}^n, defined so every size-ss samplable distribution DD “misses” HH outside of its k=O((slogs)/ϵ)k=O((s \log s)/\epsilon) heaviest points, meaning D(HH)<ϵD(H \setminus H^*) < \epsilon for some HHH^* \subset H, H=k|H^*| = k. The corresponding concept class CHC_H is defined by CH={fH:f:{0,1}n{0,1},fH(x)=f(x)C_H = \{f_H: f: \{0,1\}^n \to \{0,1\}, f_H(x) = f(x) if xHx \in H, $0$ otherwise}\}. While its VC-dimension remains large, the evasiveness property ensures that, against samplable DD, memorization-based learners cover nearly all of DD using a small sample SS (Blanc et al., 1 Dec 2025).

3. Explicit Evasive Sets and Efficient Learnability

Explicit evasive sets are sets Hn{0,1}nH_n \subset \{0,1\}^n characterized by:

  • Membership in HnH_n is decidable by a circuit of size poly(n)\text{poly}(n).
  • Hn|H_n| is super-polynomial in nn.
  • For all s=poly(n)s = \text{poly}(n), every size-ss sampler DD (1/s,sc)(1/s, s^c)-misses HnH_n for some constant cc.

Such sets, if constructed, provide both hardness and efficient recognizability, crucial for sharp separations between standard and samplable PAC learnability. Their existence is connected to core complexity-theoretic hypotheses: if P=NPP = NP, existential sampling makes HH non-evasive. In the random oracle model, explicit HOH^O can be produced that are evasive for all polynomial-size oracle samplers, with HO=2Ω(n)|H^O| = 2^{\Omega(n)} (Blanc et al., 1 Dec 2025).

4. Computational Separations and Reductions

The study extends beyond sample complexity to computational complexity:

  • Let FF be a pseudorandom function family secure against poly(n)\text{poly}(n) circuits; define C={fH:fF}C = \{f_H: f \in F\}, where fH(x)=f(x)f_H(x) = f(x) if xHx \in H, $0$ otherwise.
  • Learning CC with respect to the uniform distribution on HH requires breaking the PRF, which is infeasible in poly(n)\text{poly}(n) time (under standard cryptographic assumptions).
  • However, by the evasiveness of HH, CC becomes learnable in poly(n)\text{poly}(n) time for all size-ss samplable distributions, as almost all of the probability mass falls on a small, easily memorizable subset.

This separation persists relative to a random oracle, demonstrating unconditional distinctions between standard and samplable PAC in oracle models (Blanc et al., 1 Dec 2025).

5. Online Learning and Adversarial Efficiency

The samplable PAC principle translates to online learning models. In the classic online framework (Littlestone 1988), an adversary provides instances; with no computational restriction, the adversary can force exponentially many mistakes (corresponding to the Littlestone dimension 2Ω(n)2^{\Omega(n)} for some CC). If the adversary is limited to producing instances by polynomial-size circuits (efficient adversary), for the same concept classes the best online learner's number of mistakes drops to O(slogs)O(s \log s). A “default-zero” memorization learner achieves low mistake bounds because an efficient adversary can only produce a bounded number of distinct “hard” points in an evasive set HH (Blanc et al., 1 Dec 2025).

Computationally, the same construction methods (PRFs + explicit HH) yield classes where online learning is hard against unbounded adversaries, yet efficient against bounded adversaries.

6. Connections to Classical PAC Learning Bounds

Classical results in PAC learning show that for a concept class CC of finite VC-dimension dd, the sample complexity in the realizable PAC setting is Θ((1/ϵ)(d+log(1/δ)))\Theta((1/\epsilon)(d + \log(1/\delta))) (Hanneke, 2015). These optimal bounds hold for learning under arbitrary distributions. The samplable PAC findings indicate that, if distributions are restricted to samplable ones, these bounds may not fully capture the true sample or computational complexity: samplable PAC learning can circumvent high sample complexity by exploiting evasiveness in the domain, even when the VC-dimension is exponential. This underscores a fundamental distinction between distributional assumptions in learning theory (Blanc et al., 1 Dec 2025).

7. Open Problems and Future Directions

A central open problem is to characterize samplable PAC sample complexity in terms of both the complexities of the concept class and the samplability of the distribution, paralleling the role of VC-dimension in standard PAC settings. Current characterizations do not yield a combinatorial or analytical measure analogous to VC for samplable PAC. Further lines of investigation include developing measures of distributional complexity beyond samplability and constructing explicit evasive sets unconditionally (that is, outside random oracle or cryptographic assumptions), with the aim of better understanding and harnessing the expanded power of efficient learning under realistic data-generation constraints (Blanc et al., 1 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Samplable PAC Learning.