Papers
Topics
Authors
Recent
Search
2000 character limit reached

Differentiable Logic Machines

Updated 5 February 2026
  • Differentiable Logic Machines (DLMs) are predicate-centric neural-logic frameworks that learn interpretable first-order logic programs via continuous relaxation and gradient descent.
  • They integrate incremental curriculum training with supervised ILP and reinforcement learning to extract concise, symbolic logic formulas efficiently.
  • Empirical evaluations reveal that DLMs outperform previous models on ILP and RL benchmarks, showing improvements in scalability, efficiency, and memory usage.

Differentiable Logic Machines (DLMs) provide a predicate-centric neural-logic framework that enables the learning of interpretable first-order logic (FOL) programs through continuous relaxation and gradient descent. Unlike traditional neural-symbolic approaches that allocate parameters to rules or entire programs, DLMs assign trainable weights to predicates, facilitating both scalability and interpretability. This architecture supports both Inductive Logic Programming (ILP) and Reinforcement Learning (RL) tasks, allowing for the recovery of symbolic, human-interpretable logic formulas from neural models while maintaining efficient training and inference properties (Zimmer et al., 2021).

1. Predicate-Centric Architecture and Continuous Relaxation

DLMs are constructed as layered networks, parameterizing logic formulas via learnable weights θp\theta_p placed on each candidate predicate pp in every layer. The assignments θp\theta_p are converted to a probability vector ww using a temperature-controlled softmax:

wp=softmax(θp/τ)=exp(θp/τ)qexp(θq/τ).w_p = \mathrm{softmax}(\theta_p/\tau) = \frac{\exp(\theta_p/\tau)}{\sum_q \exp(\theta_q/\tau)}.

Each layer ll and breadth bb maintains corresponding sets of bb-ary predicate tensors PRm××mP \in \mathbb{R}^{m \times \cdots \times m}, with entries P(i1,...,ib)[0,1]P(i_1, ..., i_b) \in [0, 1] representing fuzzy truth values for grounded atoms over pp0 constants.

Logical operators are relaxed via fuzzy logic:

  • Fuzzy conjunction: pp1(componentwise product)
  • Fuzzy disjunction: pp2
  • Negation: pp3

Each conjunction or disjunction module takes convex combinations (weighted sums) of input predicates using pp4, then applies the appropriate fuzzy logic operation. These modules are composed recursively—breadth by breadth, layer by layer—producing at the final layer a tensor encoding the target logical relation.

The network’s forward pass can be formalized as:

pp5

where pp6 denotes the pool of candidate pp7-ary input predicates, augmented by expansion, reduction (pp8, pp9), permutation, negation, and a constant true predicate θp\theta_p0.

This predicate-centric relaxation ensures that parameter count scales linearly with the number of predicates, in contrast to exponential growth when parameterizing rules, thereby supporting tractability on large domains.

2. Training Methodologies: Supervised ILP and RL

DLMs are trained using both supervised ILP loss and actor–critic RL objectives.

  • Supervised ILP: The objective is to match the output predicate tensor θp\theta_p1 to the ground truth θp\theta_p2 via binary cross-entropy over all θp\theta_p3-tuples:

θp\theta_p4

Stochastic gradient descent is used, with Gumbel-Softmax noise and θp\theta_p5 annealing to encourage near-discrete program selection and dropout for sparsity.

  • Reinforcement Learning (Actor–Critic PPO): The DLM actor outputs an action tensor from input predicates, applying a low-θp\theta_p6 softmax over feasible actions. Proximal Policy Optimization (PPO) is employed:

θp\theta_p7

where θp\theta_p8 and θp\theta_p9 is the GAE-estimated advantage.

The critic network consists of gated recurrent units (GRUs), one per argument dimension, that scan predicate tensor slices; embeddings are concatenated and fused with an MLP to estimate ww0. The critic loss is mean squared error.

3. Incremental and Progressive Training for Deep Logic Discovery

To effectively learn deep logic formulas, DLMs are trained incrementally:

  • The process begins with an initial set of predicates ww1.
  • Once the current DLM model is trained, invented predicates with high discrete selection weights are extracted as auxiliary predicates ww2.
  • The base set is then augmented ww3, and a new DLM block is trained.
  • This process is repeated for multiple phases ww4 until an interpretable and accurate logic program is produced.

In RL, each phase can leverage either supervised imitation of an expert policy or PPO-based RL to augment interpretability. This curriculum stacking enables the modeling of complex, deep logic programs that would otherwise be hard to discover in a single stage.

4. Extraction and Discretization of Interpretable Logic Programs

After training, the continuously relaxed logic program is discretized by converting predicate weights to one-hot selections (argmax) or by thresholding. This reduces the fuzzy operations to classical Boolean ww5 and yields a symbolic FOL program executable via Boolean tensors. The program extraction traverses the dependency directed acyclic graph (DAG) from target down to initial predicates, compiling a set of Horn-style clauses.

The resulting logic program is concise—typically under 200 clauses for complex RL domains—and admits direct execution with computational cost proportional to ww6 where ww7.

5. Performance, Scaling, and Empirical Results

DLMs were evaluated on standard differentiable ILP and RL benchmarks, outperforming or matching previous neural-logic models such as ww8ILP and NLM in several dimensions.

Task Type Main Datasets/Benchmarks SOTA Baselines DLM Performance
ILP Family Tree, Graph, 2-outdegree, Zip ww9ILP, NLM 100% success across 20 seeds (L=4, B=2); up to 3.5wp=softmax(θp/τ)=exp(θp/τ)qexp(θq/τ).w_p = \mathrm{softmax}(\theta_p/\tau) = \frac{\exp(\theta_p/\tau)}{\sum_q \exp(\theta_q/\tau)}.0 higher “successful seed” rate; extracted programs run 5–20wp=softmax(θp/τ)=exp(θp/τ)qexp(θq/τ).w_p = \mathrm{softmax}(\theta_p/\tau) = \frac{\exp(\theta_p/\tau)}{\sum_q \exp(\theta_q/\tau)}.1 faster and 10–100wp=softmax(θp/τ)=exp(θp/τ)qexp(θq/τ).w_p = \mathrm{softmax}(\theta_p/\tau) = \frac{\exp(\theta_p/\tau)}{\sum_q \exp(\theta_q/\tau)}.2 more memory efficient at test time
RL Blocksworld (Stack, Unstack, On), Sorting, Path NLM Matches NLM rewards (wp=softmax(θp/τ)=exp(θp/τ)qexp(θq/τ).w_p = \mathrm{softmax}(\theta_p/\tau) = \frac{\exp(\theta_p/\tau)}{\sum_q \exp(\theta_q/\tau)}.30.92); Sorting 0.939 (DLM, m=10), generalizes to wp=softmax(θp/τ)=exp(θp/τ)qexp(θq/τ).w_p = \mathrm{softmax}(\theta_p/\tau) = \frac{\exp(\theta_p/\tau)}{\sum_q \exp(\theta_q/\tau)}.4 at 0.559 (+0.3% over NLM); Path domain: four stacked DLM blocks achieve 97% success at wp=softmax(θp/τ)=exp(θp/τ)qexp(θq/τ).w_p = \mathrm{softmax}(\theta_p/\tau) = \frac{\exp(\theta_p/\tau)}{\sum_q \exp(\theta_q/\tau)}.5 and wp=softmax(θp/τ)=exp(θp/τ)qexp(θq/τ).w_p = \mathrm{softmax}(\theta_p/\tau) = \frac{\exp(\theta_p/\tau)}{\sum_q \exp(\theta_q/\tau)}.6; generalizes to block configurations wp=softmax(θp/τ)=exp(θp/τ)qexp(θq/τ).w_p = \mathrm{softmax}(\theta_p/\tau) = \frac{\exp(\theta_p/\tau)}{\sum_q \exp(\theta_q/\tau)}.7 larger than seen during training

DLMs enable extracted programs to scale to much larger numbers of constants than SOTA, with parameter count scaling as wp=softmax(θp/τ)=exp(θp/τ)qexp(θq/τ).w_p = \mathrm{softmax}(\theta_p/\tau) = \frac{\exp(\theta_p/\tau)}{\sum_q \exp(\theta_q/\tau)}.8 and inference time as wp=softmax(θp/τ)=exp(θp/τ)qexp(θq/τ).w_p = \mathrm{softmax}(\theta_p/\tau) = \frac{\exp(\theta_p/\tau)}{\sum_q \exp(\theta_q/\tau)}.9 at test time. When interpretability is enforced, the reward loss in RL is less than 5%, while the recovered program remains concise.

6. Significance and Applications

DLMs advance neural-symbolic systems by offering a scalable, interpretable, and efficient architecture for learning first-order logic programs. Their ability to recover symbolic programs—rather than merely making predictions—positions them for use-cases requiring interpretability, verification, and reasoning over variable-size domains. The continuous relaxation approach enables direct optimization via gradient descent, supporting both supervised ILP and RL tasks within a unified framework (Zimmer et al., 2021).

A plausible implication is that predicate-centric parameterization could mitigate combinatorial explosion associated with rule learning in classical neural-logic architectures. Incremental curriculum training further empowers DLMs to solve domains previously inaccessible to end-to-end differentiable ILP systems.

7. Limitations and Research Directions

While DLMs demonstrate favorable scaling and interpretability, the architecture is currently restricted to specific classes of logic programs (e.g., Horn clauses and certain modular compositions). The fuzzy relaxation and subsequent discretization may limit fidelity for some logic forms. A plausible direction for future research includes extending DLMs to richer logics, integrating background knowledge more flexibly, and exploring broader applications in domains such as program synthesis and scientific reasoning.

For comprehensive implementation details, empirical analyses, and comparisons to prior work, see "Differentiable Logic Machines" (Zimmer et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Differentiable Logic Machines (DLMs).