Non-transformer effective models for LLM operation
Determine whether the operation of a large language model, under a fixed prompt and task setting, can be accurately captured by a small effective sequence-to-sequence network that is not a transformer, and construct and characterize such an alternative effective model if it exists.
References
However, one might ask whether the operation of the LLM could be modeled via some other effective sequence-to-sequence network. We leave this question to future work.
— A model of errors in transformers
(2601.14175 - Raju et al., 20 Jan 2026) in Appendix, Section “Discussion of Assumptions,” item 1