Origin of cross-genre differences in code length and conditional entropy
Ascertain whether the observed cross-genre differences in the dependence of code length L(N) and conditional entropy s(N) on context length N arise from inherent properties of the underlying texts in each genre or from each genre’s relationship to the training sets used to train the large language models employed for estimation.
References
On the other hand we see significant differences across genres in the dependence of code length $L(N)$ and conditional entropy $s(N)$ on the context length $N$, and it is not clear whether this reflects inherent features of the real text or each genre's relationship to the models' training sets.
— Large language models and the entropy of English
(2512.24969 - Scheibner et al., 31 Dec 2025) in Summary (final paragraph of main text)