Experimental Analysis of Large-scale Learnable Vector Storage Compression
Abstract: Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains. However, the high dimensionality of sparse data in recommendation tasks and the huge volume of corpus in retrieval-related tasks lead to a large memory consumption of the embedding table, which poses a great challenge to the training and deployment of models. Recent research has proposed various methods to compress the embeddings at the cost of a slight decrease in model quality or the introduction of other overheads. Nevertheless, the relative performance of these methods remains unclear. Existing experimental comparisons only cover a subset of these methods and focus on limited metrics. In this paper, we perform a comprehensive comparative analysis and experimental evaluation of embedding compression. We introduce a new taxonomy that categorizes these techniques based on their characteristics and methodologies, and further develop a modular benchmarking framework that integrates 14 representative methods. Under a uniform test environment, our benchmark fairly evaluates each approach, presents their strengths and weaknesses under different memory budgets, and recommends the best method based on the use case. In addition to providing useful guidelines, our study also uncovers the limitations of current methods and suggests potential directions for future research.
- Structured Pruning of Deep Convolutional Neural Networks. ACM Journal on Emerging Technologies in Computing Systems 13, 3 (2017), 32:1–32:18.
- Post training 4-bit quantization of convolutional networks for rapid-deployment. In Advances in Neural Information Processing Systems 32 (NeurIPS).
- Can We Gain More from Orthogonality Regularizations in Training Deep Networks?. In Advances in Neural Information Processing Systems 31 (NeurIPS).
- Improving Language Models by Retrieving from Trillions of Tokens. In Proceedings of the 39th International Conference on Machine Learning (ICML).
- Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33 (NeurIPS).
- GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking. In Advances in Neural Information Processing Systems 31 (NeurIPS).
- SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search. In Advances in Neural Information Processing Systems 34 (NeurIPS).
- LOGER: A Learned Optimizer towards Generating Efficient and Robust Query Execution Plans. Proceedings of the VLDB Endowment 16, 7 (2023), 1777–1789.
- Differentiable Product Quantization for End-to-End Embedding Compression. In Proceedings of the 37th International Conference on Machine Learning (ICML).
- Learning K-way D-dimensional Discrete Codes for Compact Embedding Representations. In Proceedings of the 35th International Conference on Machine Learning (ICML).
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI).
- Learning Elastic Embeddings for Customizing On-Device Recommenders. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD).
- KallaxDB: A Table-less Hash-based Key-Value Store on Storage Hardware with Built-in Transparent Compression. In Proceedings of the 17th International Workshop on Data Management on New Hardware (DaMoN).
- Clustered Embedding Learning for Recommender Systems. In Proceedings of the Web Conference (WWW).
- Wide & Deep Learning for Recommender Systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems (DLRS@RecSys).
- Differentiable Neural Input Search for Recommender Systems. CoRR abs/2006.04466 (2020).
- DeepRec. 2021. Adaptive Embedding. https://github.com/alibaba/DeepRec/blob/main/docs/docs_en/Adaptive-Embedding.md.
- DeepLight: Deep Lightweight Feature Interactions for Accelerating CTR Predictions in Ad Serving. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM).
- Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systems. In Proceedings of Machine Learning and Systems (MLSys).
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
- Distributed Representations of Tuples for Entity Resolution. Proceedings of the VLDB Endowment 11, 11 (2018), 1454–1467.
- Learned Step Size quantization. In 8th International Conference on Learning Representations (ICLR).
- Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In 7th International Conference on Learning Representations (ICLR).
- Luyu Gao and Jamie Callan. 2021. Condenser: a Pre-training Architecture for Dense Retrieval. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP).
- Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems. In IEEE International Symposium on Information Theory (ISIT).
- Post-Training 4-bit Quantization on Embedding Tables. In Workshop on Systems for ML at NeurIPS.
- Manu: A Cloud Native Vector Database Management System. Proceedings of the VLDB Endowment 15, 12 (2022), 3548–3561.
- Deep Learning with Limited Numerical Precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML).
- Retrieval Augmented Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning (ICML).
- Personalized Re-ranking for Recommendation with Mask Pretraining. Data Science and Engineering 8, 4 (2023), 357–367.
- Learning both Weights and Connections for Efficient Neural Network. In Advances in Neural Information Processing Systems 28 (NeurIPS).
- Multimodal Interactive Network for Sequential Recommendation. Journal of Computer Science and Technology 38, 4 (2023), 911–926.
- Tensorized Embedding Layers. In Findings of the Association for Computational Linguistics (EMNLP).
- Effective and Efficient Retrieval of Structured Entities. Proceedings of the VLDB Endowment 13, 6 (2020), 826–839.
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. The Journal of Machine Learning Research 18 (2017), 187:1–187:30.
- Piotr Indyk and Rajeev Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing (STOC).
- Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117–128.
- Neural Input Search for Large Scale Recommendation Models. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD).
- Billion-Scale Similarity Search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547.
- Learning Multi-granular Quantized Embeddings for Large-Vocab Categorical Features in Recommender Systems. In Companion Proceedings of The Web Conference.
- Learning to Embed Categorical Features without Embedding Tables for Recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD).
- Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).
- Natural language to SQL: Where are we today? Proceedings of the VLDB Endowment 13, 10 (2020), 1737–1750.
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations (ICLR).
- Adrian Kochsiek and Rainer Gemulla. 2021. Parallel Training of Knowledge Graph Embedding Models: A Comparison of Techniques. Proceedings of the VLDB Endowment 15, 3 (2021), 633–645.
- AutoSrh: An Embedding Dimensionality Search Framework for Tabular Data Prediction. IEEE Transactions on Knowledge and Data Engineering 35, 7 (2023), 6673–6686.
- Natural Questions: a Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics 7 (2019), 452–466.
- Cardinality Estimation of Approximate Substring Queries using Deep Learning. Proceedings of the VLDB Endowment 15, 11 (2022), 3145–3157.
- Criteo Labs. 2014. Kaggle display advertising challenge dataset. https://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset.
- Seulki Lee and Shahriar Nirjon. 2020. Fast and scalable in-memory deep multitask learning via neural weight virtualization. In Proceedings of the 18th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys).
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL).
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems 33 (NeurIPS).
- Training Quantized Nets: A Deeper Understanding. In Advances in Neural Information Processing Systems 30 (NeurIPS).
- Pruning Filters for Efficient ConvNets. In 5th International Conference on Learning Representations (ICLR).
- Adaptive Low-Precision Training for Embeddings in Click-Through Rate Prediction. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
- Lightweight Self-Attentive Sequential Recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM).
- LightRec: A Memory and Search-Efficient Recommender System. In Proceedings of the Web Conference (WWW).
- Persia: An Open, Hybrid System Scaling Deep Learning-based Recommenders up to 100 Trillion Parameters. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD).
- AdaFS: Adaptive Feature Selection in Deep Recommender System. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD).
- Automated Embedding Size Search in Deep Recommender Systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
- Fauce: Fast and Accurate Deep Ensembles with Uncertainty for Cardinality Estimation. Proceedings of the VLDB Endowment 14, 11 (2021), 1950–1963.
- Learnable Embedding sizes for Recommender Systems. In 9th International Conference on Learning Representations (ICLR).
- Optimizing Feature Set for Click-Through Rate Prediction. In Proceedings of the Web Conference (WWW).
- OptEmbed: Learning Optimal Embedding Table for Click-through Rate Prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM).
- Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems 45 (2014), 61–68.
- Yury A. Malkov and Dmitry A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (2020), 824–836.
- Hetu: a highly efficient automatic parallel distributed deep learning system. Science China Information Sciences 66, 1 (2023).
- HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training. In Proceedings of the International Conference on Management of Data (SIGMOD).
- HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework. Proceedings of the VLDB Endowment 15, 2 (2022), 312–320.
- Mixed Precision Training. In 6th International Conference on Learning Representations (ICLR).
- Software-hardware co-design for fast and scalable training of deep learning recommendation models. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA).
- Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR abs/1906.00091 (2019).
- MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. In Proceedings of the Workshop on Cognitive Computation at NeurIPS (CoCo@NeurIPS).
- Learning Compressed Embeddings for On-Device Inference. In Proceedings of Machine Learning and Systems (MLSys).
- NVIDIAÂ AI platform. 2020. MLPerf Benchmark. https://mlperf.org.
- Single-shot Embedding Dimension Search in Recommender System. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
- RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. The Journal of Machine Learning Research 21 (2020), 140:1–140:67.
- HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory. In Advances in Neural Information Processing Systems 33 (NeurIPS).
- SlimDB: A Space-Efficient Key-Value Storage Engine For Semi-Sorted Data. Proceedings of the VLDB Endowment 10, 13 (2017), 2037–2048.
- RecShard: statistical feature-based memory optimization for industry-scale neural recommendation. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
- FairCF: fairness-aware collaborative filtering. Science China Information Sciences 65, 12 (2022).
- UMEC: Unified model and embedding compression for efficient recommendation systems. In 9th International Conference on Learning Representations (ICLR).
- Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD).
- Raphael Shu and Hideki Nakayama. 2018. Compressing Word Embeddings via Deep Compositional Code Learning. In 6th International Conference on Learning Representations (ICLR).
- Rand-NSG: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node. In Advances in Neural Information Processing Systems 32 (NeurIPS).
- Hash Embeddings for Efficient Word Representations. In Advances in Neural Information Processing Systems 30 (NeurIPS).
- MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis. In Proceedings of the International Conference on Management of Data (SIGMOD).
- Next Point-of-Interest Recommendation on Resource-Constrained Mobile Devices. In Proceedings of the Web Conference (WWW).
- Deep & Cross Network for Ad Click Predictions. In Proceedings of the ADKDD’17.
- Steve Wang and Will Cukierski. 2014. Avazu Click-Through Rate Prediction. https://kaggle.com/competitions/avazu-ctr-prediction.
- AutoField: Automating Feature Selection in Deep Recommender Systems. In Proceedings of the Web Conference (WWW).
- Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference. In Proceedings of the 16th ACM Conference on Recommender Systems (RecSys).
- AutoIAS: Automatic Integrated Architecture Searcher for Click-Trough Rate Prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM).
- Feature hashing for large scale multitask learning. In Proceedings of the 26th International Conference on Machine Learning (ICML).
- Developing a Recommendation Benchmark for MLPerf Training and Inference. CoRR abs/2003.07336 (2020).
- Leveraging Document-Level and Query-Level Passage Cumulative Gain for Document Ranking. Journal of Computer Science and Technology 37, 4 (2022), 814–838.
- Field-wise Embedding Size Search via Structural Hard Auxiliary Mask Pruning for Click-Through Rate Prediction. In Proceedings of the Workshop on Deep Learning for Search and Recommendation (DL4SR) at CIKM.
- Kraken: memory-efficient continual learning for large-scale real-time recommendations. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
- Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In 9th International Conference on Learning Representations (ICLR).
- Agile and Accurate CTR Prediction Model Training for Massive-Scale Online Advertising Systems. In Proceedings of the International Conference on Management of Data (SIGMOD).
- Binary Code based Hash Embedding for Web-scale Applications. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM).
- Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM).
- Mixed-Precision Embedding Using a Cache. CoRR abs/2010.11305 (2020).
- Scaling Attributed Network Embedding to Massive Graphs. Proceedings of the VLDB Endowment 14, 1 (2020), 37–49.
- i-Razor: A Differentiable Neural Input Razor for Feature Selection and Dimension Search in DNN-Based Recommender Systems. IEEE Transactions on Knowledge & Data Engineering 01 (2023), 1–14.
- TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models. In Proceedings of Machine Learning and Systems (MLSys).
- Evolving Interest with Feature Co-action Network for CTR Prediction. Data Science and Engineering 8, 4 (2023), 344–356.
- Model Size Reduction Using Frequency Based Double Hashing for Recommender Systems. In Proceedings of the 14th ACM Conference on Recommender Systems (RecSys).
- Model-enhanced Vector Index. CoRR abs/2309.13335 (2023).
- Jia-Dong Zhang and Chi-Yin Chow. 2015. GeoSoCa: Exploiting Geographical, Social and Categorical Correlations for Point-of-Interest Recommendations. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
- Training with low-precision embedding tables. In Workshop on Systems for ML at NeurIPS.
- PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems. In 38th IEEE International Conference on Data Engineering (ICDE).
- Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In Proceedings of Machine Learning and Systems (MLSys).
- AIBox: CTR Prediction Model Training on a Single Node. In Proceedings of the 28th ACM International Conference on Information & Knowledge Management (CIKM).
- AutoEmb: Automated Embedding Dimensionality Search in Streaming Recommendations. In IEEE International Conference on Data Mining (ICDM).
- AutoDim: Field-aware Embedding Dimension Searchin Recommender Systems. In Proceedings of the Web Conference (WWW).
- QueryFormer: A Tree Transformer Model for Query Plan Representation. Proceedings of the VLDB Endowment 15, 8 (2022), 1658–1670.
- Deep Interest Network for Click-Through Rate Prediction. In Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD).
- Serving Deep Learning Models with Deduplication from Relational Databases. Proceedings of the VLDB Endowment 15, 10 (2022), 2230–2243.
- Open Benchmarking for Click-Through Rate Prediction. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.