New preprint: https://arxiv.org/abs/2405.13798 We propose a new asymptotic equipartition property for the perplexity of a large piece of text generated by a language model and present theoretical arguments for this property. Perplexity, defined as a inverse likelihood function, is widely used as a performance metric for training language models. Our main result states that theContinue reading “LLMs are Slaves to the Law of Large Numbers”