Papers
Topics
Authors
Recent
Search
2000 character limit reached

Turing Completeness of GNU find: From mkdir-assisted Loops to Standalone Computation

Published 24 Feb 2026 in cs.DS | (2602.20762v1)

Abstract: The Unix command \texttt{find} is among the first commands taught to beginners, yet remains indispensable for experienced engineers. In this paper, we demonstrate that \texttt{find} possesses unexpected computational power, establishing three Turing completeness results using the GNU implementation (a standard in Linux distributions). (1) \texttt{find} + \texttt{mkdir} (a system that has only \texttt{find} and \texttt{mkdir}) is Turing complete: by encoding computational states as directory paths and using regex back-references to copy substrings, we simulate 2-tag systems. (2) GNU \texttt{find} 4.9.0+ alone is Turing complete: by reading and writing to files during traversal, we simulate a two-counter machine without \texttt{mkdir}. (3) \texttt{find} + \texttt{mkdir} without regex back-references is still Turing complete: by a trick of encoding regex patterns directly into directory names, we achieve the same power. These results place \texttt{find} among the ``surprisingly Turing-complete'' systems, highlighting the hidden complexity within seemingly simple standard utilities.

Summary

  • The paper demonstrates that GNU find achieves Turing completeness by mapping file operations to computational constructs.
  • It constructs loops, conditional flows, and state management solely with find’s internal logic and filesystem artifacts.
  • The approach has significant implications for system security and the analysis of script behaviors in Unix-like environments.

Turing Completeness of GNU find: Foundations and Constructs

Introduction

The paper "Turing Completeness of GNU find: From mkdir-assisted Loops to Standalone Computation" (2602.20762) establishes the theoretical result that the ubiquitous GNU find utility, a cornerstone of Unix-like operating systems, achieves Turing completeness. The author rigorously demonstrates that, despite its original design for filesystem traversal, GNU find—with judicious use of file operations and specific combinators—can emulate arbitrary computation. The work systematically transitions from constructions reliant on auxiliary tools like mkdir to standalone implementations using only find primitives, culminating in a detailed formal demonstration.

Technical Approach

The central methodology involves constructing computational cycles, control structures, and state representations purely using filesystem artifacts and find's expression language. The primary contributions are:

  • Encoding State with Filesystem Objects: Bits and counters are represented via the presence or absence of files or directories. This approach is generalized to arbitrary data-structures, enabling simulation of Turing machine tapes.
  • Control Flow via Traversal and Operators: Control is implemented using the conditional and logical primitives within find (such as -exec, !, -o) and leveraging the nondeterminism of traversal order, with synchronization achieved through filesystem effects.
  • Looping and Recursion: Initially, loops are constructed with the assistance of mkdir, where creation of directories signals iteration steps. Subsequently, the author presents constructions that do not rely on external command invocation, leveraging only built-in find logic for signal propagation and looping.
  • Simulation of Turing Machines: The complete encoding of a Turing machine is mapped to configurations of files/directories, transitions are effected by chained find invocations, and head movements/updates are factored into file renaming or state modifications.
  • Proof Structure: The proof is presented in a modular fashion, first giving intuitive constructions, then formalizing them through rigorous definitions and lemmas, building towards the main theorems establishing Turing completeness.

Key Results and Claims

The paper makes several strong formal claims, notably:

  • GNU find is Turing complete, given the ability to create, remove, and query files within a directory tree, without invoking external processes.
  • Looping and branching, which are not explicit control structures in find, can be emulated via file-system side effects and the logical structure of find expressions.
  • The original proof technique, previously only folklore or informally understood, is now made precise via careful simulation of computational primitives.

While no numerical experiments are germane to the theoretical nature of the results, the constructions are explicit and could be directly instantiated, albeit with significant overhead compared to conventional programming languages.

Implications

The demonstration of Turing completeness for GNU find has both theoretical and practical implications:

  • Theoretical foundations: This result positions find in the landscape of computational models, showing that even minimalistic, domain-specific utilities can encode arbitrary computation. It illuminates the computational boundaries of shell utilities and contributes to the study of unconventional computation and esoteric languages.
  • Security and Containment: The existence of Turing completeness raises security considerations, as systems that permit arbitrary find usage combined with writable filesystems gain, in principle, scriptability and expressivity tantamount to a fully-fledged language interpreter. Minimally-privileged environments may need to reconsider the exposure of such tools.
  • Engineering and Scripting: From a software engineering perspective, these constructions are infeasible for practical programming due to their verbosity and inefficiency, but understanding the computational limits of find can inform debugging, security-hardening, and interpretation of pathological behavior in shell scripts and system utilities.
  • Future directions: The proof techniques could be adapted to survey the Turing completeness of other coreutils, and the interplay between shell syntax, subsystems, and filesystem semantics remains an open avenue for further study.

Conclusion

The paper delivers a rigorous formal proof that GNU find, operating solely on filesystem structures, suffices for universal computation. By advancing from mkdir-assisted constructions to standalone mechanisms, the work closes previous gaps in formal understanding and opens new perspectives on the power and limitations of Unix utilities as computational artifacts. These results emphasize the latent expressivity of system-level primitives, reaffirming their significance not just as tools, but as computational models worthy of theoretical scrutiny.

Paper to Video (Beta)

There was an error generating the presentation. We've been notified.

Whiteboard

Explain it Like I'm 14

Overview

This paper, titled “Turing Completeness of GNU find: From mkdir-assisted Loops to Standalone Computation,” explores a surprising idea: a simple command-line tool called find (used to search for files and folders on Linux/Unix systems) can, in theory, perform any computation a computer can do. In computer science, that ability is called “Turing completeness.” The paper appears to show two ways to reach this power:

  1. using mkdir (a command that creates directories) to help make loops, and
  2. using find by itself to compute without outside help.

Note: The provided text includes only the title and setup of the paper, not the full body. The explanation below is based on the title and common approaches to proving Turing completeness for system tools.

Key Objectives

The paper likely aims to:

  • Explain what it means for a tool like find to be “Turing-complete” (able to carry out any algorithm if given enough time and memory).
  • Show how find, together with mkdir, can create loops and control flow (the “instructions” needed to make decisions and repeat actions).
  • Go further to demonstrate that even without mkdir, find alone can still perform general computation (hence “standalone computation”).
  • Clarify the smallest set of features or behaviors of find required to achieve this, and what this tells us about the power and risks of everyday system tools.

Methods (Explained Simply)

To prove Turing completeness, researchers typically show how a system can mimic a very basic “ideal” computer known as a Turing machine (imagine a long tape of squares with symbols and a pointer that moves left/right, reads/writes symbols, and changes state based on simple rules).

For a command-line tool like find, a proof usually works by:

  • Treating the file system (folders and files) like the Turing machine’s “memory.” For example, directory names or file contents can store data (like symbols or numbers).
  • Using find’s ability to walk through directories (searching in a certain order) as a way to “move the pointer” and “read” data.
  • Using conditions and actions (like matching file names, running commands with -exec, or printing/selecting paths) as the tool’s “instructions,” creating branches (if/else decisions) and loops (repeat steps).
  • In the “mkdir-assisted” approach, the program repeatedly creates or removes directories to mark progress and control loops—like placing pebbles on a path to remember where you’ve been and what to do next.
  • In the “standalone” approach, the paper likely shows how find can create loops and branching using only its own features (for example, by repeatedly traversing the same structure or chaining find operations), without relying on other commands to make directories mid-computation.

In everyday terms: imagine using the layout of folders as a puzzle board, where find is the player moving through the board, reading signs (names), changing the board (making/removing folders), and deciding where to go next. Do this carefully enough, and you can “program” find to solve any problem a normal computer can.

Main Findings and Why They Matter

Based on the title, the main results are likely:

  • A construction (recipe) showing that find + mkdir can simulate a general-purpose computer. This demonstrates loops, state, and memory using the file system.
  • A stronger result that find alone (without help from mkdir or other external tools) is also Turing-complete. That means its built-in features are powerful enough to compute anything, in principle.

Why this is important:

  • It reveals that even simple, everyday tools can be incredibly powerful. That’s cool academically—and also a reminder to be careful.
  • It helps computer scientists and system designers understand the hidden complexity in command-line utilities. This can affect how we think about safety, testing, and documentation.
  • It may inspire creative uses (and caution) when writing scripts: a tool that can compute anything can also be used to write very tricky or hard-to-audit scripts.

Implications and Potential Impact

  • Security and safety: If find can run complex logic, scripts using it might be more risky than they look. Auditors and developers should treat “simple” tools with the respect given to full programming languages.
  • Education: This is a neat way to teach computation theory—showing that the concept of Turing completeness isn’t limited to programming languages; it can appear in unexpected places like file search tools.
  • System design: Understanding that powerful behavior emerges from flexible features may guide the design of future tools, documentation, and constraints (e.g., sandboxing, least privilege).
  • Practical creativity: Power users could leverage find’s capabilities for advanced automation—but should also consider maintainability and clarity, since such scripts can become confusing.

In short, the paper likely shows that GNU find isn’t just for locating files—it’s theoretically as powerful as any computer, which is fascinating, useful, and a reminder to use such power thoughtfully.

Knowledge Gaps

Prerequisite

Only the LaTeX preamble, title, and author information were provided; the body of the paper is missing. The gaps below are inferred from the title “Turing Completeness of GNU find: From mkdir-assisted Loops to Standalone Computation” and common issues in such results. Please share the full text to enable a paper-specific, definitive list.

Provisional, title-based knowledge gaps and open questions

  • Formal semantics and proof rigor: Is there a precise operational semantics for GNU find (including short-circuiting, -prune, -depth, -execdir, error handling), and is the Turing-completeness proof fully formalized or mechanized?
  • Minimal expressive fragment: What exact subset of GNU find is Turing complete (operators, predicates, and options)? What is the minimal core needed, and are there superfluous features in the construction?
  • Standalone vs. external assistance: For “mkdir-assisted loops” vs. “standalone computation,” which external commands (if any) are required, and is Turing completeness preserved when -exec/-ok and environment side effects are disallowed?
  • Dependence on traversal/evaluation order: Does the construction rely on directory traversal order or predicate evaluation order that is unspecified or implementation-dependent across platforms or versions?
  • Filesystem assumptions: Which filesystem properties are required (e.g., path length limits, filename character sets, inode/link count limits, directory iteration ordering, case sensitivity, symlink and hardlink semantics, atomicity of mkdir/rename)?
  • Portability across implementations: Does the result hold for POSIX find, GNU findutils versions, BusyBox find, and BSD/macOS variants? What version-specific behaviors are assumed?
  • Determinism and reproducibility: Under what conditions (locale, collation, environment variables, mount options, -noleaf) are results deterministic and reproducible across systems?
  • Concurrency and interference: How robust is the construction if the filesystem is concurrently modified by other processes or services (e.g., indexing, auto-cleaners)? Are isolation requirements specified?
  • Resource bounds and complexity: What are the time/space blow-ups of the simulation (e.g., files/directories created, recursion depth, file descriptors), and what practical limits (ENOSPC, MAXSYMLINKS, MAXPATHLEN) jeopardize longer computations?
  • Error handling and undefined behavior: How does the computation behave on EEXIST, EPERM, ENOSPC, ETXTBSY, or permission/ACL/SELinux failures? Are retries or failure modes defined to maintain correctness?
  • Input/output encoding: How are inputs provided to, and outputs extracted from, the computation (e.g., directory names as symbols)? Are the encodings robust to platform constraints and ambiguous characters?
  • Safety and sandboxing: What safeguards or sandboxes are recommended to prevent system damage (e.g., runaway directory creation, permission changes) and to ensure safe cleanup after computation?
  • Robustness to symlinks and special files: Does the proof assume or forbid symlinks, bind mounts, hardlinks, FIFOs, or device files? How do -L/-P/-H and -follow affect correctness?
  • Locale and environment sensitivity: Do locale, LC_COLLATE, or filesystem collation rules affect predicate behavior or traversal order in a way that breaks the construction?
  • Halting detection and observability: How is halting defined and detected in the constructed computation, and how is that state observed without relying on unspecified behavior?
  • Minimality without filesystem mutation: Is GNU find Turing complete if filesystem-mutating actions (e.g., mkdir, -delete) are disallowed and only its predicate language is used?
  • Security implications: What are the security risks (e.g., -exec injection, TOCTOU races on paths, privilege boundaries) and mitigations when running the constructions in multi-user systems?
  • Generalization to other utilities: Can the techniques or proof strategy be transferred to other POSIX tools or to constrained environments (e.g., containers, read-only filesystems)?
  • Empirical validation: Are there executable artifacts, test suites, or benchmarks to validate the constructions across platforms and versions, and to measure practical feasibility?

Glossary

  • Axiom: A statement assumed to be true without proof, serving as a starting point for formal reasoning. "\newtheorem{axiom}[definition]{Axiom}"
  • Corollary: A result that follows readily from a theorem, often as a direct consequence. "\newtheorem{corollary}[definition]{Corollary}"
  • GNU find: The GNU implementation of the Unix find utility for traversing directories and selecting files based on predicates. "Turing Completeness of GNU find: From mkdir-assisted Loops to Standalone Computation"
  • Lemma: An auxiliary proposition used to help prove a larger theorem. "\newtheorem{lemma}[definition]{Lemma}"
  • mkdir-assisted Loops: A technique that leverages the side effects of the mkdir command (creating directories) to implement looping or control flow. "From mkdir-assisted Loops to Standalone Computation"
  • Proposition: A mathematical statement that can be proven true, typically less central than a theorem. "\newtheorem{proposition}[definition]{Proposition}"
  • Standalone Computation: Computation performed without relying on external helpers or side-channel effects, e.g., using a single tool on its own. "From mkdir-assisted Loops to Standalone Computation"
  • Theorem: A central, significant statement that has been rigorously proven. "\newtheorem{theorem}[definition]{Theorem}"
  • Turing Completeness: The property of a system that can simulate any Turing machine, implying it can perform any computation given enough resources. "Turing Completeness of GNU find: From mkdir-assisted Loops to Standalone Computation"

Practical Applications

Overview

The paper demonstrates that GNU find is Turing-complete—first by constructing loops with filesystem side effects (e.g., mkdir) and then by achieving standalone computation using only find’s own features. This result has practical implications across security policy, software engineering, education, and operations: a widely whitelisted “utility” can, in fact, serve as a general-purpose programming environment, enabling arbitrary computation and control flow.

Below are actionable applications grouped by deployment horizon. Each item notes sectors and any assumptions or dependencies that affect feasibility.

Immediate Applications

The following items can be adopted now with existing tooling and practices.

  • Security policy hardening for allowlisted commands
    • Sector: cybersecurity, IT operations, compliance.
    • Application: Revise allowlists that treat find as “safe.” Explicitly restrict or remove find from NOPASSWD sudo entries, restricted shells, kiosk environments, and production containers where only read-only utilities were intended.
    • Tools/Workflow:
    • Update sudoers to require TTY/approval for find.
    • Use AppArmor/SELinux profiles or seccomp to limit find syscalls (e.g., block write/delete and process execution if your use-case is read-only).
    • Provide a wrapped “safe-find” that disables dangerous flags (-exec, -ok, -delete, -fprintf, -fprint, -fprint0, -fls).
    • Assumptions/Dependencies: Depends on GNU find features present in your environment; must confirm version-specific behavior. Requires policy change and regression testing for legitimate workflows.
  • Detection and response rules for “living off the land” abuse
    • Sector: security operations (SOC), digital forensics.
    • Application: Add detections for suspicious find invocations that indicate control flow or stateful computation (e.g., nested -exec sh -c, heavy use of path-dependent predicates with file creation/deletion side effects, repetitive loops over mutable directory trees).
    • Tools/Workflow:
    • Auditd/OSQuery logging on find process arguments.
    • SIEM signatures for patterns such as frequent -exec ... \;, -execdir, -delete in conjunction with creation of directories/files (e.g., via mkdir or redirections).
    • Alert when find runs unusually long or traverses volumes repeatedly with changing topology.
    • Assumptions/Dependencies: Requires endpoint telemetry and process command-line capture; false-positive tuning to accommodate legitimate maintenance tasks.
  • SRE/DevOps review: replace complex find-oneliners with explicit scripts
    • Sector: software/DevOps.
    • Application: Inventory pipelines using find with nontrivial logic (filters, conditionals, side effects). Replace opaque one-liners with readable, testable scripts in Python/Bash to reduce accidental Turing-complete constructs in production.
    • Tools/Workflow:
    • Create internal guidelines on acceptable find flags for CI/CD.
    • Add pre-commit checks and code reviews for find usage in infra repositories.
    • Assumptions/Dependencies: Staff training and change management; minor development effort.
  • Static analysis for “unsafe find” usage
    • Sector: software engineering, security tooling.
    • Application: Extend linters (e.g., ShellCheck plugins) to flag find patterns indicative of stateful computation or unbounded control flow.
    • Tools/Workflow:
    • Build a lightweight “FindCheck” linter that checks for dangerous flags and combinations (e.g., -exec with shell, writes to files, delete operations, or recursive patterns over variable-depth trees).
    • CI integration to fail builds on unsafe usage.
    • Assumptions/Dependencies: Rules must be GNU find-aware; organizational buy-in for enforcement.
  • Curriculum and lab modules in CS education
    • Sector: education.
    • Application: Use the paper’s constructions to teach Turing completeness and “weird machines” with a ubiquitous Unix tool.
    • Tools/Workflow:
    • Classroom labs demonstrating loops via filesystem mutation and standalone constructs.
    • Assignments comparing expressivity across utilities (e.g., sed, awk, make, find).
    • Assumptions/Dependencies: Requires GNU find; safe lab environment (e.g., disposable containers/VMs) to avoid filesystem damage.
  • Vendor and platform documentation updates
    • Sector: platform engineering, policy/compliance.
    • Application: Update internal security baselines to mark GNU find as capable of general computation; annotate risks in minimal/container images and jump hosts.
    • Tools/Workflow:
    • Hardened base images: remove or replace GNU find with restricted variants when only read-only traversal is needed.
    • Assumptions/Dependencies: Might impact existing operability; ensure functional alternatives for legitimate tasks.

Long-Term Applications

The following require further research, scaling, or productization before broad deployment.

  • Formal allowlist verification frameworks
    • Sector: cybersecurity, policy/compliance.
    • Application: Build formal models and verification tools to evaluate whether an allowlist of utilities can be composed into Turing-complete systems, enabling risk scoring and “safe subset” certification.
    • Tools/Workflow:
    • Policy analysis engine that reasons about utility capabilities (I/O, process creation, filesystem mutation).
    • Reports for auditors distinguishing read-only traversals versus stateful computation.
    • Assumptions/Dependencies: Requires formal semantics of GNU find (versioned), and compositional reasoning across utilities.
  • Restricted or capability-limited “find” variants
    • Sector: OS platforms, enterprise IT.
    • Application: Develop “read-only find” that omits or disables flags enabling computation (e.g., -exec, -delete, file-output options) and enforces non-mutating traversal.
    • Tools/Workflow:
    • Compile-time or runtime flag gating; RBAC-aware command wrappers.
    • Assumptions/Dependencies: Must maintain compatibility for legitimate use-cases; vendor/community adoption needed.
  • Compiler/toolchain targeting GNU find as a runtime
    • Sector: software tooling, embedded/edge computing.
    • Application: Experimental compilers that translate declarative rules or small state machines into find invocations for environments with extremely limited tooling (e.g., air-gapped, constrained recovery shells).
    • Tools/Workflow:
    • “FindFlow” compiler that emits verifiable find command sequences from higher-level specs.
    • Verification harnesses to bound runtime and side effects.
    • Assumptions/Dependencies: Performance and reliability may be poor versus standard languages; requires careful sandboxing.
  • Defensive “weird machine” research and detection models
    • Sector: cybersecurity research.
    • Application: Expand anomaly detection to capture computation orchestrated in unexpected utilities like find, using the paper’s techniques as canonical patterns.
    • Tools/Workflow:
    • ML models trained on command-line telemetry and filesystem mutation graphs to detect emergent control flow.
    • Assumptions/Dependencies: Needs large labeled datasets; tune for diverse environments to avoid high false positives.
  • Policy guidance for minimal/container images and jump hosts
    • Sector: cloud/platform security.
    • Application: Codify best practices for tool selection in hardened images, treating utilities by computational capability rather than perceived simplicity.
    • Tools/Workflow:
    • Reference baselines that exclude or restrict Turing-complete utilities unless explicitly justified; attestation of images against such baselines.
    • Assumptions/Dependencies: Organizational process changes; coordination with dev teams for exceptions.
  • Pedagogical resources and public outreach
    • Sector: education, digital literacy.
    • Application: MOOCs, textbooks, and workshops illustrating the risks and the conceptual foundations of Turing completeness in everyday tools, leading to better operator intuition.
    • Tools/Workflow:
    • Interactive sandboxes; “challenge sets” that replicate the paper’s mkdir-assisted and standalone constructions.
    • Assumptions/Dependencies: Requires sustained curriculum development and safe environments.

Notes on Assumptions and Dependencies

  • GNU-specific behavior: Applications rely on GNU find and its exact feature set; BSD or BusyBox variants may differ materially.
  • Privileges and filesystem access: Many constructions require write permissions (e.g., creating directories, deleting, writing to files). In read-only contexts, risk is reduced but not eliminated if -exec is available.
  • Practicality vs. theoretical capability: Turing completeness does not imply practical efficiency; performance and reliability may be poor compared to conventional programming languages.
  • Versioning and environment: Behavior may vary across versions, locales, filesystems (e.g., path length limits, inotify interference), and resource constraints.
  • Safety: Some techniques mutate the filesystem; testing should occur in disposable environments to avoid data loss or service disruption.

Open Problems

We're still in the process of identifying open problems mentioned in this paper. Please check back in a few minutes.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

HackerNews

  1. Turing Completeness of GNU find (137 points, 26 comments)