Non-transformer effective models for LLM operation

Determine whether the operation of a large language model, under a fixed prompt and task setting, can be accurately captured by a small effective sequence-to-sequence network that is not a transformer, and construct and characterize such an alternative effective model if it exists.

Background

The core analysis assumes that an LLM’s behavior on specific prompts and tasks can be modeled by a small effective transformer whose parameters slightly differ from those of an idealized error-free model. This effective-model assumption underpins the derivation of the proposed accuracy law.

The authors explicitly raise the question of whether an LLM’s operation could instead be modeled by some other effective sequence-to-sequence architecture, and leave this question to future work, signaling that the existence and form of non-transformer effective models remains an open issue.

References

However, one might ask whether the operation of the LLM could be modeled via some other effective sequence-to-sequence network. We leave this question to future work.

A model of errors in transformers  (2601.14175 - Raju et al., 20 Jan 2026) in Appendix, Section “Discussion of Assumptions,” item 1