Papers
Topics
Authors
Recent
Search
2000 character limit reached

Slaves to the Law of Large Numbers: An Asymptotic Equipartition Property for Perplexity in Generative Language Models

Published 22 May 2024 in cs.CL, cs.AI, cs.IT, and math.IT | (2405.13798v3)

Abstract: We prove a new asymptotic equipartition property for the perplexity of long texts generated by a LLM and present supporting experimental evidence from open-source models. Specifically we show that the logarithmic perplexity of any large text generated by a LLM must asymptotically converge to the average entropy of its token distributions. This defines a "typical set" that all long synthetic texts generated by a LLM must belong to. We show that this typical set is a vanishingly small subset of all possible grammatically correct outputs. These results suggest possible applications to important practical problems such as (a) detecting synthetic AI-generated text, and (b) testing whether a text was used to train a LLM. We make no simplifying assumptions (such as stationarity) about the statistics of LLM outputs, and therefore our results are directly applicable to practical real-world models without any approximations.

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.