2000 character limit reached
Probabilistic Transformers
Published 15 Oct 2020 in cs.LG and stat.ML | (2010.15583v3)
Abstract: We show that Transformers are Maximum Posterior Probability estimators for Mixtures of Gaussian Models. This brings a probabilistic point of view to Transformers and suggests extensions to other probabilistic cases.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.