ChaosBench-Logic Benchmarking

Updated 12 January 2026

ChaosBench-Logic is a suite of canonical benchmarks and datasets that rigorously test symbolic reasoning and chaos detection in nonlinear dynamical systems.
It employs analytical models like the Boolean Glass network and formal first-order logic frameworks to evaluate algorithmic and logical performance.
The framework also includes engineering benchmarks for chaos-based logic operations, highlighting metrics for energy efficiency, noise tolerance, and computational robustness.

ChaosBench-Logic refers to a family of canonical benchmarks, datasets, and structured test-cases designed to evaluate symbolic reasoning, algorithmic detection, and computational logic over chaotic dynamical systems. This term encompasses several lines of research, including (1) mathematically minimal dynamical systems that manifest analytically verifiable chaos for benchmarking detection algorithms, (2) datasets and formal logic task suites that test logical and symbolic reasoning about chaos for LLMs and reasoning engines, and (3) engineering realizations of chaos-based logic operations for unconventional computing and logic gate construction. Its scope spans nonlinear dynamics, mathematical logic, symbolic computation, and machine learning, all unified by the theme of rigorously diagnosing and exploiting chaos in a logic-grounded, principled fashion.

1. Foundational Mathematical Example: The Boolean Glass Network

The original “ChaosBench-Logic” minimal-case system, introduced in Edwards (Edwards, 2019), is a four-dimensional continuous-time Boolean Glass network defined on $\mathbb{R}^4$ by

$\begin{aligned} y_1' &= -y_1 + 2[y_3] - 1 \ y_2' &= -y_2 + 2[1 - y_1 - y_3 + y_3 y_4] - 1 \ y_3' &= -y_3 + 2[(1-y_1)(1-y_4) + y_2 y_4] - 1 \ y_4' &= -y_4 + 2[(1-y_1)(1-y_3) + y_1 y_2] - 1 \end{aligned}$

where each $[\cdot]$ denotes a Boolean-valued step function (0/1). The key properties are:

Uniform decay and no self-input ensure piecewise-linear bounded trajectories with switch-induced nonlinearity.
The system induces a family of discrete-time fractional-linear return maps at each orthant boundary of the hypercube, which in general takes the reduced form $M(y) = \frac{A y}{1 + \phi^T y}$ for $A \in \mathbb{R}^{3 \times 3 }$ and cone $C$ of admissible initial conditions.
For specific parameterizations and cycles (notably two distinct 8-step cycles), the embedding exhibits a Smale horseshoe-like geometry with a Cantor-like invariant set $A$ supporting infinitely many unstable periodic orbits and a positive Lyapunov exponent.
Explicit analytic construction yields the invariant cones, eigenstructure, and periods of orbits. The “1-cycle” has $\lambda_2 \approx 1.9457$ with period $P_1 = \ln(\lambda_2) \approx 0.6656$ ; the “0-cycle” lacks stable periodic orbits as its dominant eigenvector lies outside $C_0$ .
The model serves as a rigorously transparent test-case for any symbolic or numerical chaos-detection algorithm—admitting full analytic solution, fractional-linear maps, and a benchmark horseshoe construction. It is minimal in dimension for the appearance of chaos in hard-switching continuous-time systems.

2. Logical and Semantic Reasoning Benchmark: First-Order Logic Suite

“ChaosBench-Logic” also denotes a structured benchmark suite for evaluating the logical and symbolic reasoning ability of LLMs and automated reasoning systems over canonical dynamical systems (Thomas, 5 Jan 2026). Key elements include:

Ontology: 11 unary predicates in first-order logic, such as $\begin{aligned} y_1' &= -y_1 + 2[y_3] - 1 \ y_2' &= -y_2 + 2[1 - y_1 - y_3 + y_3 y_4] - 1 \ y_3' &= -y_3 + 2[(1-y_1)(1-y_4) + y_2 y_4] - 1 \ y_4' &= -y_4 + 2[(1-y_1)(1-y_3) + y_1 y_2] - 1 \end{aligned}$ 0, $\begin{aligned} y_1' &= -y_1 + 2[y_3] - 1 \ y_2' &= -y_2 + 2[1 - y_1 - y_3 + y_3 y_4] - 1 \ y_3' &= -y_3 + 2[(1-y_1)(1-y_4) + y_2 y_4] - 1 \ y_4' &= -y_4 + 2[(1-y_1)(1-y_3) + y_1 y_2] - 1 \end{aligned}$ 1, $\begin{aligned} y_1' &= -y_1 + 2[y_3] - 1 \ y_2' &= -y_2 + 2[1 - y_1 - y_3 + y_3 y_4] - 1 \ y_3' &= -y_3 + 2[(1-y_1)(1-y_4) + y_2 y_4] - 1 \ y_4' &= -y_4 + 2[(1-y_1)(1-y_3) + y_1 y_2] - 1 \end{aligned}$ 2, $\begin{aligned} y_1' &= -y_1 + 2[y_3] - 1 \ y_2' &= -y_2 + 2[1 - y_1 - y_3 + y_3 y_4] - 1 \ y_3' &= -y_3 + 2[(1-y_1)(1-y_4) + y_2 y_4] - 1 \ y_4' &= -y_4 + 2[(1-y_1)(1-y_3) + y_1 y_2] - 1 \end{aligned}$ 3, $\begin{aligned} y_1' &= -y_1 + 2[y_3] - 1 \ y_2' &= -y_2 + 2[1 - y_1 - y_3 + y_3 y_4] - 1 \ y_3' &= -y_3 + 2[(1-y_1)(1-y_4) + y_2 y_4] - 1 \ y_4' &= -y_4 + 2[(1-y_1)(1-y_3) + y_1 y_2] - 1 \end{aligned}$ 4, $\begin{aligned} y_1' &= -y_1 + 2[y_3] - 1 \ y_2' &= -y_2 + 2[1 - y_1 - y_3 + y_3 y_4] - 1 \ y_3' &= -y_3 + 2[(1-y_1)(1-y_4) + y_2 y_4] - 1 \ y_4' &= -y_4 + 2[(1-y_1)(1-y_3) + y_1 y_2] - 1 \end{aligned}$ 5, $\begin{aligned} y_1' &= -y_1 + 2[y_3] - 1 \ y_2' &= -y_2 + 2[1 - y_1 - y_3 + y_3 y_4] - 1 \ y_3' &= -y_3 + 2[(1-y_1)(1-y_4) + y_2 y_4] - 1 \ y_4' &= -y_4 + 2[(1-y_1)(1-y_3) + y_1 y_2] - 1 \end{aligned}$ 6, $\begin{aligned} y_1' &= -y_1 + 2[y_3] - 1 \ y_2' &= -y_2 + 2[1 - y_1 - y_3 + y_3 y_4] - 1 \ y_3' &= -y_3 + 2[(1-y_1)(1-y_4) + y_2 y_4] - 1 \ y_4' &= -y_4 + 2[(1-y_1)(1-y_3) + y_1 y_2] - 1 \end{aligned}$ 7, $\begin{aligned} y_1' &= -y_1 + 2[y_3] - 1 \ y_2' &= -y_2 + 2[1 - y_1 - y_3 + y_3 y_4] - 1 \ y_3' &= -y_3 + 2[(1-y_1)(1-y_4) + y_2 y_4] - 1 \ y_4' &= -y_4 + 2[(1-y_1)(1-y_3) + y_1 y_2] - 1 \end{aligned}$ 8, $\begin{aligned} y_1' &= -y_1 + 2[y_3] - 1 \ y_2' &= -y_2 + 2[1 - y_1 - y_3 + y_3 y_4] - 1 \ y_3' &= -y_3 + 2[(1-y_1)(1-y_4) + y_2 y_4] - 1 \ y_4' &= -y_4 + 2[(1-y_1)(1-y_3) + y_1 y_2] - 1 \end{aligned}$ 9, and $[\cdot]$ 0.
Global Axiom System $[\cdot]$ 1: Horn-style implications that only encode unidirectional domain truths, e.g.,

$[\cdot]$ 2

without converses.

30 systems (chaotic ODEs, regular ODEs, discrete maps, PDEs, stochastic processes) with ground-truth predicate assignments.
621 questions across seven categories: atomic QA, multi-hop implication, inter-system analogy/non-analogy, counterfactual parameter change, bias resistance, multi-turn dialogue, concept-synthesis/compositional reasoning.
Metrics: Logical accuracy (per-item), dialogue-level accuracy (all turns must be correct), contradiction rate (axiom violation within model commitments). Open-source Python pipeline normalizes outputs and computes FOL closure.
Key empirical findings: Per-item accuracy for leading LLMs is 91–94%, but compositional reasoning is fragile (0% accuracy), and dialogue-level accuracy is markedly lower (53–76% depending on prompting mode/model). Primary failure mode: confusion of one-way and converse implications, especially in chaining scientific properties.

3. Engineering: Chaos-Based Logic Operations and Device Benchmarking

Engineering realizations of “ChaosBench-Logic” benchmark the physical substrates and their exploitation for computation via attractor hopping (Murali et al., 2018):

Physical system: Murali–Lakshmanan–Chua circuit, a piecewise-linear oscillator with two state variables and input bias, driven by two small-amplitude logic input square waves $[\cdot]$ 3, a bias $[\cdot]$ 4, and a periodic forcing $[\cdot]$ 5.
Logic encoding: Logical 0/1 mapped to two distinct chaotic orbits in different phase-space half-planes (e.g., $[\cdot]$ 6 for 1, $[\cdot]$ 7 for 0). Output is read by threshold on $[\cdot]$ 8 or $[\cdot]$ 9; can yield complementary gates (OR/NOR, AND/NAND) in parallel.
Noise-assisted operation: For modest input amplitudes, inclusion of zero-mean Gaussian noise $M(y) = \frac{A y}{1 + \phi^T y}$ 0 yields “logical stochastic resonance” (LSR) at the attractor level—noise actually stabilizes logic operation over an intermediate window, with $M(y) = \frac{A y}{1 + \phi^T y}$ 1 for some $M(y) = \frac{A y}{1 + \phi^T y}$ 2.
Benchmark metrics: Gate complexity, input amplification ratio (attrator amplitude/input), speed (switching latency $M(y) = \frac{A y}{1 + \phi^T y}$ 3– $M(y) = \frac{A y}{1 + \phi^T y}$ 4 s), robustness in $M(y) = \frac{A y}{1 + \phi^T y}$ 5 and $M(y) = \frac{A y}{1 + \phi^T y}$ 6 parameter space, noise-tolerance, parallelism (dual outputs per device), energy efficiency. These define comparative benchmarks for future chaos-computing platforms.

4. Composite Algorithmic Benchmarks: Chaos/Noise Discrimination

Methodologically distinct, but conceptually related, are unified chaos detection pipelines: 0–1 test, Benford’s-law compliance, and nonlinear noise reduction (Srivastava et al., 2021). This “ChaosBench” (Editor’s term) provides algorithmic logic for robustly labeling time series as Regular, Stochastic/Noise, or Chaotic. Key steps are:

0–1 Test: Transforms a time series $M(y) = \frac{A y}{1 + \phi^T y}$ $M (y) = \frac{A y}{1 + ϕ ^{T} y}$ 7 to translation variables $M(y) = \frac{A y}{1 + \phi^T y}$ $M (y) = \frac{A y}{1 + ϕ ^{T} y}$ 8 for random $M(y) = \frac{A y}{1 + \phi^T y}$ $M (y) = \frac{A y}{1 + ϕ ^{T} y}$ 9; computes the mean-square displacement $A \in \mathbb{R}^{3 \times 3 }$ $A \in R^{3 \times 3}$ 0 and uses the Pearson correlation $A \in \mathbb{R}^{3 \times 3 }$ $A \in R^{3 \times 3}$ 1 over increments. Median $A \in \mathbb{R}^{3 \times 3 }$ $A \in R^{3 \times 3}$ 2 across $A \in \mathbb{R}^{3 \times 3 }$ $A \in R^{3 \times 3}$ 3 gives:
- $A \in \mathbb{R}^{3 \times 3 }$ 4 regular, $A \in \mathbb{R}^{3 \times 3 }$ 5 chaotic.
- Pure noise can produce $A \in \mathbb{R}^{3 \times 3 }$ 6 (false positive)—alone, not reliable.
Benford’s Law Compliance Test (BLCT): Inspects scale-invariant distribution of significant digits (base 10). Pure stochastic series yield small, flat $A \in \mathbb{R}^{3 \times 3 }$ 7 for all $A \in \mathbb{R}^{3 \times 3 }$ 8; deterministic content yields higher and varying $A \in \mathbb{R}^{3 \times 3 }$ 9.
Schreiber’s Nonlinear Denoising: Local averaging in time-delay embedding; recovers structure for moderately noisy deterministic series.
Decision logic: Combines these signals in a decision tree: $C$ 0 ⇒ regular; else, if all $C$ 1 and nearly flat ⇒ stochastic/noise; else denoise and retest $C$ 2. Generalizes to any scalar experimental or simulated time series.

5. Cross-Domain Benchmarking and Logical Rule Integration

Benchmarks that integrate logic reasoning with physical or dynamical context require appropriate ontologies, axioms, and data representations (Liu et al., 2023). In the context of “ChaosBench-Logic,” these considerations are operationalized by:

Explicit formal specification: Grounding all queries in FOL predicates and rules; encoding data in self-contained JSON with equations, parameters, and logical annotations per system.
Evaluation pipeline: Modular pipelining—system loader, question generator, model API, logic closure verifier—enables scalable and reproducible benchmarking across classes of logic-capable models and algorithms.
Performance metrics: Logical consistency, execution time, and contradiction rate (axiom violations) are essential for evaluating the interplay between logic rules and empirical/system-based properties.
Generalization: The methodology readily generalizes beyond chaotic systems to any scientific or engineering domain wherein logic-based scientific reasoning over complex, sometimes empirically opaque data is required.

6. Implications for Scientific Reasoning and Tool Development

ChaosBench-Logic serves as a reference for both the fundamental limits and practical capabilities of logic-driven scientific reasoning, chaos detection, and unconventional computation:

For scientific reasoning engines and LLMs: Even with high per-item accuracy, global coherence and compositional reasoning remain incomplete. Benchmark results demonstrate the need for explicit logical closure, resistance to incorrect converse implications, and deeper symbolic tool integration (Thomas, 5 Jan 2026).
For physical computing devices: Chaotic attractor-hopping systems as logic gates provide energy-efficient, noise-tolerant, and high-density solutions, directly measurable using standardized ChaosBench-Logic metrics (Murali et al., 2018).
For algorithmic detection: Unified logic-based pipelines can robustly separate chaos from stochasticity, especially in low SNR and empirically contaminated regimes (Srivastava et al., 2021).
For theoretical research: The Boolean Glass network model provides a transparent, analytically tractable test-case for hard-switching chaos, explicitly linking symbolic, numerical, and geometric methods in chaos theory (Edwards, 2019).

7. Prospects and Future Directions

Potential generalizations of ChaosBench-Logic include richer multi-sorted ontologies (e.g., bifurcation relations), expanded system families (PDEs, high-dimensional flows), hybrid symbolic–numeric queries (for example, Lyapunov exponent estimation as a back-end service), and greater integration with formal methods or theorem-proving. Consistent with the modular pipeline design, such extensions require only the addition of logical predicate assignments and formal task families. This provides a unified foundation for evaluating and advancing neuro-symbolic approaches to rigorous scientific reasoning in machine learning and automated science platforms (Thomas, 5 Jan 2026).