Large language models are not about natural language

Published 15 Dec 2025 in cs.CL and q-bio.NC | (2512.13441v1)

Abstract: LLMs are useless for linguistics, as they are probabilistic models that require a vast amount of data to analyse externalized strings of words. In contrast, human language is underpinned by a mind-internal computational system that recursively generates hierarchical thought structures. The language system grows with minimal external input and can readily distinguish between real language and impossible languages.

Abstract PDF Upgrade to Chat

Summary

The paper argues that LLMs simulate superficial language patterns rather than capturing the deep, recursive structures fundamental to human language.
It contrasts human language acquisition—characterized by the poverty of the stimulus—with LLMs’ dependence on massive data and pattern matching.
The analysis highlights the unsustainable computational costs and energy demands of LLMs, questioning their viability as models of cognitive language processes.

LLMs Are Not About Natural Language: A Critical Perspective

Overview of the Argument

The paper "LLMs are not about natural language" (2512.13441) offers a rigorous critique of the proposition that LLMs contribute meaningfully to theoretical linguistics or cognitive science. The authors argue that LLMs are fundamentally inadequate as models of human language cognition because their probabilistic, data-intensive approaches diverge radically from the biological mechanisms underlying human linguistic competence. Central to this critique is the assertion that the architecture, learning dynamics, and inductive biases of LLMs are orthogonal to the recursive, structure-building faculties posited by generative linguistics.

Critique of Probabilistic Modeling in LLMs

The paper positions LLMs as a contemporary extension of century-old probabilistic models, noting that their primary operation is a statistical analysis of "flattened" token sequences. In contrast, the human language faculty, as articulated in generative grammar and the Strong Minimalist Thesis, is described as a mind-internal recursive mechanism capable of constructing hierarchically organized structures from minimal input. The probabilistic nature of LLMs is claimed to be essentially incompatible with the recursion-driven generative procedure of human syntax and semantics. The authors argue that, while LLMs can simulate surface-level linguistic patterns, they lack the recursive generative power required to represent underlying compositional structures that determine meaning.

Language Acquisition: Data, Inductive Biases, and Learning

A central empirical argument is derived from the "Poverty of the Stimulus" paradigm. The authors emphasize that human children acquire the core of syntactic competence with modest, often incomplete, input. Infants display robust generalizations and can construct grammatical forms not attested in the ambient input, a fact that LLMs—dependent on massive datasets and computational resources—utterly fail to replicate. The paper highlights neurobiological and developmental evidence that human learners internalize linguistic constraints (e.g., distinctions among possible, impossible, and unattested languages) that LLMs do not discover or represent, even when trained on curated language data.

The critique further addresses the distinct developmental trajectories observable in natural language acquisition, contrasting the staged, overgenerating grammars of children with the pattern-matching regimes of LLMs. Particularly, LLMs "learn" only through adjustment of model parameters in response to direct exposures, without any observable analog of the stages seen in human first language development.

LLMs and the Problem of Impossible Languages

The authors present experimental and computational evidence that LLMs do not distinguish between possible (i.e., humanly learnable) and "impossible" (unnatural or unattested) languages. References to recent studies (Ziv et al., 8 Oct 2025, Luo et al., 2024) demonstrate that LLMs exhibit comparable performance on both natural and reverse-ordered (backward) English, a result not mirrored by human subjects. The human syntactic faculty, often localized in Broca's area, is shown to selectively respond to violations of structural constraints—something that LLMs, lacking explicit recursive or constituent-based representation, cannot replicate.

Furthermore, the authors dissect recent claims from proponents of LLMs as cognitive models, critiquing methodologies (e.g., shuffled sentence benchmarks) and asserting that gross performance deficits in LLMs on truly structureless data do not validate their competence with structural generalization.

Computational Efficiency and Resource Expenditure

A salient non-linguistic argument presented concerns the vast disparity in energetic and computational efficiency between human and artificial systems. While the human brain operates on approximately 20W, LLMs require extraordinary infrastructure, measured in tens of megawatts and facilitated by non-sustainable practices, as exemplified by reports on xAI and Google's planned nuclear facilities. This energy profile further underscores, in the authors' view, the lack of cognitive plausibility in current LLM approaches.

Theoretical and Practical Implications

The authors conclude that LLMs do not offer insight into the innate constraints, compositionality, or cognitive mechanisms of natural language. Any superficial convergence between LLM outputs and human utterances is, according to this view, a matter of shallow pattern matching rather than the elucidation of cognitive universals. The paper suggests that continued conflation of LLM performance with linguistic competence risks obfuscating the core explanatory aims of biolinguistics and that models informative for linguistic theory must address recursion, structure-dependence, and the rational induction of hierarchical representations.

Future Directions

The stance advanced in the paper calls for the development of models of language that integrate explicit structural constraints, are efficient in data usage, and reflect biologically and developmentally plausible learning trajectories. This perspective suggests that future AI research, if it seeks to inform or align with cognitive science, should embed universal grammar-like inductive biases and incorporate models of hierarchy and recursion as primitives—not emergent artefacts of massive statistical training. Without these, LLM-driven approaches are likely to remain, in the authors' estimation, tools for simulating surface properties rather than advancing our understanding of the language faculty.

Conclusion

The paper posits that LLMs, as currently architected and trained, are not viable models of natural language cognition. Their probabilistic, data-dependent mechanisms, lack of hierarchical recursion, and inability to distinguish possible from impossible languages mark them as computational tools rather than cognitive or linguistic models. As such, researchers are urged to maintain a strict distinction between engineering success in language simulation and explanatory adequacy in cognitive theory.