Papers
Topics
Authors
Recent
Search
2000 character limit reached

The Probably Approximately Correct Learning Model in Computational Learning Theory

Published 11 Nov 2025 in stat.ML and cs.LG | (2511.08791v1)

Abstract: This survey paper gives an overview of various known results on learning classes of Boolean functions in Valiant's Probably Approximately Correct (PAC) learning model and its commonly studied variants.

Summary

  • The paper formalizes the PAC learning model, establishing a rigorous link between computational limits and statistical inference in learning from data.
  • It details algorithmic techniques and sample complexity bounds for learning various Boolean function classes under both distribution-free and uniform settings.
  • The paper explores inherent hardness results via NP-hardness, cryptographic assumptions, and average-case challenges, guiding future research directions.

The Probably Approximately Correct Learning Model in Computational Learning Theory

Introduction and Historical Context

The Probably Approximately Correct (PAC) learning model, introduced by Valiant, formalized the study of learnability by defining clear algorithmic and complexity-theoretic foundations for learning from data. This model balances statistical and computational considerations, establishing a framework to quantitatively reason about what classes of functions are feasibly learnable by algorithms with bounded resources. The central scientific problem that arises from the PAC framework is to demarcate the boundary between efficiently learnable and non-learnable function classes, laying the groundwork for a program whose implications extend broadly across theoretical computer science and machine learning.

PAC Learning Framework: Definitions and Intuitions

Model Ingredients

  • Instance Space (XX): Typically {0,1}n\{0,1\}^n or Rn\mathbb{R}^n, representing feature vectors.
  • Concepts and Concept Classes (C\mathcal{C}): Boolean-valued functions (or subspaces thereof), usually defined via some syntactic constraint (e.g., conjunctions, DNF, LTFs).
  • Distribution (D\mathcal{D}): Arbitrary, unknown distribution over XX (distribution-free).
  • Sample Access: The learner draws i.i.d. labeled samples (x,f(x))(x, f(x)) with xDx \sim \mathcal{D}.
  • Hypothesis Class (H\mathcal{H}): Functions output as predictions; may or may not coincide with C\mathcal{C} (proper vs. non-proper learning).

Success Criteria

An algorithm AA PAC-learns C\mathcal{C} if, for any unknown fCf \in \mathcal{C} and any distribution D\mathcal{D}, given parameters 0<ϵ,δ<10<\epsilon, \delta<1, AA outputs (with probability at least 1δ1-\delta over its sample) a hypothesis hh with errorD(h,f)ϵ.\mathrm{error}_\mathcal{D}(h, f) \leq \epsilon. Computational and statistical efficiency require AA to run and sample in time polynomial in nn, 1/ϵ1/\epsilon, and log(1/δ)\log(1/\delta).

Model Variants

Extensions of the PAC model concern:

  • Distribution-specific settings (e.g., uniform/Gaussian distributions).
  • Memberhip (black-box) query access.
  • Learning under various noise models (malicious, agnostic, random-classification-noise).
  • Relaxations to non-realizability (agnostic learning).

The flexibility and generality of the model solidified PAC as the canonical abstraction for computational learning theory.

Exemplary Algorithms and Learnable Classes

Boolean Conjunctions

The classical elimination algorithm for conjunctions removes literals that contradict observed positive examples. The sample complexity O(1ϵ(n+log1δ))O\left(\frac{1}{\epsilon}(n + \log \frac{1}{\delta})\right) and polynomial runtime suffices for PAC learnability.

Extensions by Feature Expansion

Applying feature expansion, the elimination method enables PAC learning of kk-CNF and kk-DNF in nO(k)n^{O(k)} time. Despite decades of research, kk-term DNF remains open for subexponential time learning for superconstant kk.

Decision Lists and Trees

Consistent hypothesis finders and greedy approaches generalize learnability to kk-decision lists and quasi-polynomial time algorithms for size-ss decision trees using either specialized recursive algorithms or reduction to decision lists.

Parities and Algebraic Classes

Learning parities reduces to solving linear systems over F2F_2; by feature expansion, F2F_2-polynomials of degree kk are learnable in nO(k)n^{O(k)} time.

PAC Learning Linear and Polynomial Threshold Functions

The polynomial learnability of LTFs leverages linear programming and VC dimension bounds; extension to polynomial threshold functions (PTFs) of degree dd is achieved by feature lifting with computational complexity nO(d)n^{O(d)}.

Robustness and Limits

Feature-based approaches deliver learnability for intersections of bounded-weight LTFs, DNF, and formulas of bounded size. However, natural function classes (e.g., general DNF, intersections of arbitrary halfspaces, kk-juntas for superconstant kk) resist all known algorithms, often due to representational and computational bottlenecks.

Distribution-Specific Learning and Fourier Analysis

Restricting the distribution (most notably to the uniform distribution) unlocks a powerful suite of tools, pivotal among them being Fourier analysis. The low-degree algorithm, predicated on the concentration of Fourier spectrum, enables learning of classes such as AC0\mathsf{AC}^0, decision trees, and functions with small L1L_1-norms. Membership queries further facilitate efficient identification of "heavy" Fourier coefficients, underpinning learnability of decision trees and DNF in polynomial time under the uniform distribution.

The uniform PAC setting exposes clear algorithmic barriers: learning kk-juntas (even monotone) remains nΘ(k)n^{\Theta(k)}-hard due to the inherent combinatorial search problem for relevant variables.

Hardness of PAC Learning

Representation-Dependent Lower Bounds

NP-hardness-based reductions show that proper PAC learning is infeasible for many rich concept classes (e.g., monotone kk-DNF for small kk, read-once Boolean formulas, threshold functions) under standard complexity conjectures. The close connection between the hardness of approximation and learning is pivotal in these constructions. Recent results extend this to the agnostic setting, amplifying the difficulty of proper learning and highlighting polynomial approximation limits even for simple classes.

Representation-Independent Hardness

Advanced negative results derive from cryptographic and average-case hardness assumptions:

  • Pseudorandom Function Families: Existence of any PRFF implies no polynomial-time learning algorithm (even with queries) for concept classes containing the PRFF.
  • Public-key Cryptography: The security of cryptosystems (e.g., RSA, discrete logarithms, lattice problems) translates to average-case hardness of learning decryption Boolean circuits, implying that shallow-depth circuits (even NC1NC^1) and finite automata are unlearnable under polynomial time unless secret-key reconstruction is tractable.
  • Average-Case CSP Hardness: Harnessing the assumed average-case difficulty of refuting random kk-SAT or kk-XOR, one shows hardness for learning weakly expressive classes (e.g., DNF with ω(1)\omega(1) terms, intersections of a few halfspaces, agnostic learning of even a single halfspace).
  • Agnostic Learning: Representation-independent barriers persist for agnostic learning under robust average-case hardness assumptions, accentuating the intrinsic algorithmic limits beyond the PAC framework's realizable case.

Implications and Future Directions

The PAC model provided the rigorous backbone for the study of efficient learnability, exposing deep connections to combinatorics, optimization, cryptography, and complexity theory. It underpins much of the theoretical infrastructure for machine learning, contributing ideas that influenced techniques such as boosting, sample complexity bounds via VC dimension, and theories of statistical query learning.

Theoretical implications include:

  • Demonstration that computational, not just information-theoretic, complexity typically governs feasibility of learning nontrivial classes.
  • Identification of universal trade-offs between accuracy, confidence, sample complexity, and running time.
  • Revelations that successful learning is inextricable from pseudorandomness and cryptographic intractability.

Practically, the abstractions inspired algorithms (and sometimes direct techniques) for robust, noise-tolerant learning and provide a lens for understanding the feasibility of tasks in modern large-scale machine learning. Furthermore, the delineation of concept classes into learnable and non-learnable regimes continues to influence the design of expressiveness-bounded model architectures and motivates current research into models circumventing known hardness barriers.

Speculatively, future progress may depend on:

  • New algorithmic paradigms not captured by current reductions or on refuting widely believed complexity conjectures.
  • Structural representation-theoretic advances for Boolean functions.
  • More refined understandings of average-case complexity in relation to learning.

Conclusion

The PAC framework, with its rigorous formalism, continues to structure the field of computational learning theory. It offers a unifying perspective tying together statistical and computational limits, and has enabled a precise understanding of which function classes are susceptible to efficient learning. Its legacy is not merely technical, but also conceptual, shaping the way both theoretical and applied machine learning communities approach the question of what can be learned—by algorithms, and ultimately, by machines in the world.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 24 likes about this paper.