Abductive Meta-Interpretive Learning
- Abductive Meta-Interpretive Learning is a neuro-symbolic framework that combines abduction, induction, and parameter optimization to derive symbolic logic programs from raw data.
- It employs an iterative cycle of sub-symbolic perception, abduction, and meta-interpretive induction to enable predicate invention and efficient program synthesis.
- Applied to tasks such as arithmetic induction, sorting, and synthetic biology, it demonstrates high accuracy and data efficiency compared to traditional models.
Abductive Meta-Interpretive Learning (Meta₍Abd₎) is a neuro-symbolic learning framework that unifies abduction, induction, and parameter optimization to jointly learn sub-symbolic perception models, induce symbolic first-order logic programs, and explain raw data in terms of latent symbolic facts. By integrating abduction with meta-interpretive learning, Meta₍Abd₎ addresses challenges fundamental to neuro-symbolic reasoning—including data-efficient program induction, predicate invention, and learning from raw, perceptual inputs without a pre-existing symbolic knowledge base—within a unified probabilistic or cost-based inference paradigm (Dai et al., 2020, Dai et al., 2021).
1. Foundational Principles and Formal Structure
Meta₍Abd₎ extends the framework of Inductive Logic Programming (ILP) by embedding abduction to infer plausible symbolic groundings from raw data, and by employing meta-interpretive induction to induce first-order logic programs, potentially involving predicate invention and recursion. The model supports joint optimization of symbolic structure and real-valued parameters associated with perception modules or numerical theory components.
The fundamental entities in Meta₍Abd₎ are:
- Logical Languages:
- : Background language for non-abducible predicates.
- : Abducible language for atoms hypothesized to explain observations.
- : Target language of predicates to be defined.
- Components:
- Background Knowledge (BK): Finite set of Horn clauses, numerical modules (e.g., neural networks or ODE solvers), and meta-rule templates.
- Abducible Set (A): Subset of , ground atoms abduced to explain data.
- Hypothesis Clauses (H): Induced logic program, subset of , constructed from meta-rule instantiations.
- Parameters (): Real-valued, found throughout neural modules or symbolic theory.
- Objective:
where is a data-fit loss, typically mean squared error for regression,
and penalizes model complexity (Dai et al., 2021).
2. Architectural and Algorithmic Workflow
The joint learning process in Meta₍Abd₎ follows an Expectation-Maximization-style loop that alternately performs abduction, induction, and parameter fitting:
- Sub-symbolic Perception: A neural network or numerical module maps each raw input to a soft assignment over symbolic pseudo-labels .
- Abduction:
For each datum, abduction hypothesizes a minimal set of ground atoms such that, together with and current , the observed output is entailed: . Abducibles commonly include latent label assignments, constraints, or unknown mechanistic constants.
- Meta-Interpretive Induction: The meta-interpreter grows by instantiating meta-rules from , possibly inventing new predicates or clause structures. Mode declarations provide syntactic bias and literal constraints.
- Parameter Optimization: is updated (e.g., via gradient descent) by maximizing the likelihood or minimizing loss on the inferred pseudo-labels , keeping and fixed.
- Iteration: Steps repeat until convergence of the loss or complexity-regularized objective. The overall score comprises both data fit and model complexity (Dai et al., 2020, Dai et al., 2021).
The following table summarizes the main steps in the learning loop:
| Step | Description | Output |
|---|---|---|
| Sub-symbolic | Map raw to | Probabilistic facts |
| Abduction | Find s.t. | Explanatory groundings |
| Induction | Induce from using meta-rules | Logic program clauses |
| Parameter fit | Optimize for best data fit given | Updated parameters |
3. Meta-Interpretive Induction and Predicate Invention
Meta₍Abd₎ leverages meta-interpretive learning (MIL) to induce logic programs by unfolding second-order meta-rule templates such as
for recursive definitions. Instantiations are guided by example coverage and regularized to promote concise, reusable theories. Clause synthesis is constrained by mode declarations specifying argument types and allowed predicate shapes.
Predicate invention is enabled by meta-rules that allow the introduction of auxiliary predicates not present in . This permits the discovery of highly abstracted or recursively defined logic subprograms, enhancing both expressivity and extrapolation capability (Dai et al., 2020).
4. Abduction from Raw and Noisy Data
Abduction in Meta₍Abd₎ serves to bridge the gap from raw or noisy observations to the symbolic level. The abduction step hypothesizes a minimal set of ground atoms (e.g., categorical labels, constraints, unknown intermediates) such that—conditional on current and —the observations are entailed with high probability. In practical implementations, abduction is performed via greedy or branch-and-bound search with probabilistic pruning, and, when applicable, arithmetic constraints are solved by CLP(Z) or similar solvers.
In the biodesign domain, abduction hypothesizes mechanistic constants (e.g., reaction rates) that explain time-series protein concentrations. In vision tasks, abduction selects maximal-probability label groundings needed to explain sequence-level targets, such as arithmetic operations over perceived digits (Dai et al., 2021, Dai et al., 2020).
5. Knowledge Representation and Reusability
All learned programs and background modules are represented in first-order logic, with strict separation between background, induced, and abducible predicates. Numerical submodules, such as ODE solvers, are integrated via predicate interfaces, permitting joint learning of mechanistic and empirical models.
A crucial feature of Meta₍Abd₎ is knowledge reuse. Induced logic programs—such as recursive sum/product or sorting predicates—can be incorporated as background in subsequent tasks, enabling transfer learning and incremental theory construction. Invented predicates are treated as native in future induction, further compounding reusability (Dai et al., 2020).
6. Complexity, Optimization, and Empirical Properties
The dominant complexity in Meta₍Abd₎ arises from the combinatorial explosion of meta-rule instantiations and the search over abducible groundings. The worst-case search is exponential; however, empirical performance in both vision and synthetic biology applications demonstrates tractable runtimes attributed to efficient pruning strategies (e.g., greedy or A*-like branch-and-bound), data-efficient induction, and disciplined meta-rule templates.
In biodesign applications, typical convergence is achieved in 10–15 major iterations, with per-iteration induction involving – clause expansion steps. For medium-scale datasets (e.g., 50 time-series), end-to-end learning completes in 1–2 hours on prototypical hardware (Dai et al., 2021).
7. Practical Applications and Experimental Results
Meta₍Abd₎ has demonstrated significant capability in diverse problem domains:
- Arithmetic Induction from MNIST: Achieved 95–98% classification accuracy and MAE on cumulative sum/product tasks, with strong extrapolation to sequences an order of magnitude longer than seen during training. End-to-end RNN or LSTM models failed to generalize (10% accuracy), while DeepProbLog was computationally intractable in these settings (Dai et al., 2020).
- Sorting Task: Induced an invented "sorted" predicate in a hierarchical fashion, enabling permutation prediction on image sequences at 91% exact accuracy for length-5 inputs and for length-7, outperforming NeuralSort and Neural Logical Machines (Dai et al., 2020).
- Synthetic Biology (Three-Gene Operon): Symbolically recovered mechanistic ODE structures, abduced reaction-rate constants within 5% of ground-truth, and minimized experimental cost by integrating active learning. Only 20 designed experiments sufficed for complete structural and parameter recovery versus a combinatorial base of 54 (Dai et al., 2021).
The following are representative induced program fragments:
1 2 3 4 5 |
f([H], H). f([X, Y|T], Z) :- add(X, Y, N), f([N|T], Z). (cumulative sum) s([A,B]) :- nn_pred(A,B). s([A,B|T]) :- nn_pred(A,B), s([B|T]). (invented sorted predicate) |
References
- "Abductive Knowledge Induction From Raw Data" (Dai et al., 2020)
- "Automated Biodesign Engineering by Abductive Meta-Interpretive Learning" (Dai et al., 2021)