Papers
Topics
Authors
Recent
Search
2000 character limit reached

Lossless data compression by large models

Published 24 Jun 2024 in cs.IT, cs.AI, and math.IT | (2407.07723v3)

Abstract: Modern data compression methods are slowly reaching their limits after 80 years of research, millions of papers, and wide range of applications. Yet, the extravagant 6G communication speed requirement raises a major open question for revolutionary new ideas of data compression. We have previously shown all understanding or learning are compression, under reasonable assumptions. LLMs understand data better than ever before. Can they help us to compress data? The LLMs may be seen to approximate the uncomputable Solomonoff induction. Therefore, under this new uncomputable paradigm, we present LMCompress. LMCompress shatters all previous lossless compression algorithms, doubling the lossless compression ratios of JPEG-XL for images, FLAC for audios, and H.264 for videos, and quadrupling the compression ratio of bz2 for texts. The better a large model understands the data, the better LMCompress compresses.

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 8 tweets with 537 likes about this paper.

HackerNews

  1. Understanding Is Compression (3 points, 1 comment) 
  2. Lossless data compression by large models (2 points, 0 comments)