Papers
Topics
Authors
Recent
Search
2000 character limit reached

Masked Mixers for Language Generation and Retrieval

Published 2 Sep 2024 in cs.CL and cs.LG | (2409.01482v4)

Abstract: Attention mechanisms that confer selective focus on a strict subset of input elements are nearly ubiquitous in LLMs today. We posit there to be downside to the use of attention: most input information is lost. In support of this idea we observe poor input representation accuracy in transformers and more accurate representation in what we term masked mixers, which replace self-attention with masked convolutions. The masked mixer learns causal language modeling more efficiently than early transformer implementations and even outperforms optimized, current transformers when training on small ($n_{ctx}<512$) but not larger context windows. Evidence is presented for the hypothesis that differences in transformer and masked mixer training efficiencies for various tasks are best predicted by input representation accuracy, or equivalently global invertibility. We hypothesize that the information loss exhibited by transformers would be more detrimental to retrieval than generation, as the former is more closely approximated by a bijective and thus invertible function. We find that masked mixers are more effective retrieval models both when the pretrained embedding model is unchanged as well as when the embedding model is modified via cosine similarity-based InfoNCE loss minimization. A small masked mixer is shown to outperform a large and near state-of-the-art transformer-based retrieval model, despite the latter being trained with many orders of magnitude more data and compute.

Authors (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 9 likes about this paper.