Differentiable Logic Machines
- Differentiable Logic Machines (DLMs) are predicate-centric neural-logic frameworks that learn interpretable first-order logic programs via continuous relaxation and gradient descent.
- They integrate incremental curriculum training with supervised ILP and reinforcement learning to extract concise, symbolic logic formulas efficiently.
- Empirical evaluations reveal that DLMs outperform previous models on ILP and RL benchmarks, showing improvements in scalability, efficiency, and memory usage.
Differentiable Logic Machines (DLMs) provide a predicate-centric neural-logic framework that enables the learning of interpretable first-order logic (FOL) programs through continuous relaxation and gradient descent. Unlike traditional neural-symbolic approaches that allocate parameters to rules or entire programs, DLMs assign trainable weights to predicates, facilitating both scalability and interpretability. This architecture supports both Inductive Logic Programming (ILP) and Reinforcement Learning (RL) tasks, allowing for the recovery of symbolic, human-interpretable logic formulas from neural models while maintaining efficient training and inference properties (Zimmer et al., 2021).
1. Predicate-Centric Architecture and Continuous Relaxation
DLMs are constructed as layered networks, parameterizing logic formulas via learnable weights placed on each candidate predicate in every layer. The assignments are converted to a probability vector using a temperature-controlled softmax:
Each layer and breadth maintains corresponding sets of -ary predicate tensors , with entries representing fuzzy truth values for grounded atoms over 0 constants.
Logical operators are relaxed via fuzzy logic:
- Fuzzy conjunction: 1(componentwise product)
- Fuzzy disjunction: 2
- Negation: 3
Each conjunction or disjunction module takes convex combinations (weighted sums) of input predicates using 4, then applies the appropriate fuzzy logic operation. These modules are composed recursively—breadth by breadth, layer by layer—producing at the final layer a tensor encoding the target logical relation.
The network’s forward pass can be formalized as:
5
where 6 denotes the pool of candidate 7-ary input predicates, augmented by expansion, reduction (8, 9), permutation, negation, and a constant true predicate 0.
This predicate-centric relaxation ensures that parameter count scales linearly with the number of predicates, in contrast to exponential growth when parameterizing rules, thereby supporting tractability on large domains.
2. Training Methodologies: Supervised ILP and RL
DLMs are trained using both supervised ILP loss and actor–critic RL objectives.
- Supervised ILP: The objective is to match the output predicate tensor 1 to the ground truth 2 via binary cross-entropy over all 3-tuples:
4
Stochastic gradient descent is used, with Gumbel-Softmax noise and 5 annealing to encourage near-discrete program selection and dropout for sparsity.
- Reinforcement Learning (Actor–Critic PPO): The DLM actor outputs an action tensor from input predicates, applying a low-6 softmax over feasible actions. Proximal Policy Optimization (PPO) is employed:
7
where 8 and 9 is the GAE-estimated advantage.
The critic network consists of gated recurrent units (GRUs), one per argument dimension, that scan predicate tensor slices; embeddings are concatenated and fused with an MLP to estimate 0. The critic loss is mean squared error.
3. Incremental and Progressive Training for Deep Logic Discovery
To effectively learn deep logic formulas, DLMs are trained incrementally:
- The process begins with an initial set of predicates 1.
- Once the current DLM model is trained, invented predicates with high discrete selection weights are extracted as auxiliary predicates 2.
- The base set is then augmented 3, and a new DLM block is trained.
- This process is repeated for multiple phases 4 until an interpretable and accurate logic program is produced.
In RL, each phase can leverage either supervised imitation of an expert policy or PPO-based RL to augment interpretability. This curriculum stacking enables the modeling of complex, deep logic programs that would otherwise be hard to discover in a single stage.
4. Extraction and Discretization of Interpretable Logic Programs
After training, the continuously relaxed logic program is discretized by converting predicate weights to one-hot selections (argmax) or by thresholding. This reduces the fuzzy operations to classical Boolean 5 and yields a symbolic FOL program executable via Boolean tensors. The program extraction traverses the dependency directed acyclic graph (DAG) from target down to initial predicates, compiling a set of Horn-style clauses.
The resulting logic program is concise—typically under 200 clauses for complex RL domains—and admits direct execution with computational cost proportional to 6 where 7.
5. Performance, Scaling, and Empirical Results
DLMs were evaluated on standard differentiable ILP and RL benchmarks, outperforming or matching previous neural-logic models such as 8ILP and NLM in several dimensions.
| Task Type | Main Datasets/Benchmarks | SOTA Baselines | DLM Performance |
|---|---|---|---|
| ILP | Family Tree, Graph, 2-outdegree, Zip | 9ILP, NLM | 100% success across 20 seeds (L=4, B=2); up to 3.50 higher “successful seed” rate; extracted programs run 5–201 faster and 10–1002 more memory efficient at test time |
| RL | Blocksworld (Stack, Unstack, On), Sorting, Path | NLM | Matches NLM rewards (30.92); Sorting 0.939 (DLM, m=10), generalizes to 4 at 0.559 (+0.3% over NLM); Path domain: four stacked DLM blocks achieve 97% success at 5 and 6; generalizes to block configurations 7 larger than seen during training |
DLMs enable extracted programs to scale to much larger numbers of constants than SOTA, with parameter count scaling as 8 and inference time as 9 at test time. When interpretability is enforced, the reward loss in RL is less than 5%, while the recovered program remains concise.
6. Significance and Applications
DLMs advance neural-symbolic systems by offering a scalable, interpretable, and efficient architecture for learning first-order logic programs. Their ability to recover symbolic programs—rather than merely making predictions—positions them for use-cases requiring interpretability, verification, and reasoning over variable-size domains. The continuous relaxation approach enables direct optimization via gradient descent, supporting both supervised ILP and RL tasks within a unified framework (Zimmer et al., 2021).
A plausible implication is that predicate-centric parameterization could mitigate combinatorial explosion associated with rule learning in classical neural-logic architectures. Incremental curriculum training further empowers DLMs to solve domains previously inaccessible to end-to-end differentiable ILP systems.
7. Limitations and Research Directions
While DLMs demonstrate favorable scaling and interpretability, the architecture is currently restricted to specific classes of logic programs (e.g., Horn clauses and certain modular compositions). The fuzzy relaxation and subsequent discretization may limit fidelity for some logic forms. A plausible direction for future research includes extending DLMs to richer logics, integrating background knowledge more flexibly, and exploring broader applications in domains such as program synthesis and scientific reasoning.
For comprehensive implementation details, empirical analyses, and comparisons to prior work, see "Differentiable Logic Machines" (Zimmer et al., 2021).