Papers
Topics
Authors
Recent
Search
2000 character limit reached

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Published 4 Jan 2024 in cs.CL and cs.AI | (2401.02994v3)

Abstract: In conversational AI research, there's a noticeable trend towards developing models with a larger number of parameters, exemplified by models like ChatGPT. While these expansive models tend to generate increasingly better chat responses, they demand significant computational resources and memory. This study explores a pertinent question: Can a combination of smaller models collaboratively achieve comparable or enhanced performance relative to a singular large model? We introduce an approach termed "blending", a straightforward yet effective method of integrating multiple chat AIs. Our empirical evidence suggests that when specific smaller models are synergistically blended, they can potentially outperform or match the capabilities of much larger counterparts. For instance, integrating just three models of moderate size (6B/13B paramaeters) can rival or even surpass the performance metrics of a substantially larger model like ChatGPT (175B+ paramaters). This hypothesis is rigorously tested using A/B testing methodologies with a large user base on the Chai research platform over a span of thirty days. The findings underscore the potential of the "blending" strategy as a viable approach for enhancing chat AI efficacy without a corresponding surge in computational demands.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Towards a human-like open-domain chatbot. CoRR, abs/2001.09977.
  2. A general language assistant as a laboratory for alignment. CoRR, abs/2112.00861.
  3. PLATO-2: towards building an open-domain chatbot via curriculum learning. CoRR, abs/2006.16779.
  4. A survey on dialogue systems: Recent advances and new frontiers. SIGKDD Explor. Newsl., 19(2):25–35.
  5. Ritvik Choudhary and Daisuke Kawahara. 2022. Grounding in social media: An approach to building a chit-chat dialogue model. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 9–15, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
  6. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
  7. Plug and play language models: A simple approach to controlled text generation. CoRR, abs/1912.02164.
  8. Summeval: Re-evaluating summarization evaluation.
  9. Ensemble distillation approaches for grammatical error correction. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2745–2749.
  10. Ensemble distillation for neural machine translation. arXiv preprint arXiv:1702.01802.
  11. High quality rather than high model probability: Minimum Bayes risk decoding with neural metrics. Transactions of the Association for Computational Linguistics, 10:811–825.
  12. Iason Gabriel. 2020. Artificial intelligence, values and alignment. CoRR, abs/2001.09768.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
  14. Rewarding chatbots for real-world engagement with millions of users.
  15. Llm-blender: Ensembling large language models with pairwise ranking and generative fusion.
  16. The relative performance of ensemble methods with deep convolutional neural networks for image classification. Journal of Applied Statistics, 45(15):2800–2818.
  17. Deep learning-and word embedding-based heterogeneous classifier ensembles for text classification. Complexity, 2018.
  18. Shankar Kumar and William Byrne. 2004. Minimum Bayes-risk decoding for statistical machine translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pages 169–176, Boston, Massachusetts, USA. Association for Computational Linguistics.
  19. Scalable agent alignment via reward modeling: a research direction. CoRR, abs/1811.07871.
  20. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models.
  21. Yong Liu and Xin Yao. 1999. Ensemble learning via negative correlation. Neural networks, 12(10):1399–1404.
  22. Llm comparative assessment: Zero-shot nlg evaluation through pairwise comparisons using large language models.
  23. Andrey Malinin and Mark Gales. 2021. Uncertainty estimation in autoregressive structured prediction. In International Conference on Learning Representations.
  24. Cued at probsum 2023: Hierarchical ensemble of summarization models.
  25. BARD: A structured technique for group elicitation of bayesian networks to support analytic reasoning. Risk Analysis, 42(6):1155–1178.
  26. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
  27. Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI. Association for Computational Linguistics, Online.
  28. Vyas Raina and Mark Gales. 2023. Minimum bayes’ risk decoding for system combination of grammatical error correction systems.
  29. Universal adversarial attacks on spoken language assessment systems. In Interspeech 2020. ISCA.
  30. Recipes for building an open-domain chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 300–325, Online. Association for Computational Linguistics.
  31. Combining outputs from multiple machine translation systems. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 228–235, Rochester, New York. Association for Computational Linguistics.
  32. Ensemble feature selection: homogeneous and heterogeneous approaches. Knowledge-Based Systems, 118:124–139.
  33. Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709.
  34. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  35. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021.
  36. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9.
  37. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  38. Joseph Weizenbaum. 1966. Eliza—a computer program for the study of natural language communication between man and machine. Commun. ACM, 9(1):36–45.
  39. David H Wolpert. 1992. Stacked generalization. Neural networks, 5(2):241–259.
  40. Deep learning for dialogue systems: Chit-chat and beyond. Foundations and Trends® in Information Retrieval, 15(5):417–589.
  41. A short survey of pre-trained language models for conversational ai-a new age in nlp. In Proceedings of the Australasian Computer Science Week Multiconference, ACSW ’20, New York, NY, USA. Association for Computing Machinery.
  42. Zhenyi Zhu. 2022. A simple survey of pre-trained language models. Preprints.org 202208.0238.
Citations (11)

Summary

  • The paper introduces a Blending technique that integrates multiple smaller chat AI models to outperform a single large model.
  • The methodology employs an ensemble of three models (6–13B parameters) to achieve superior user engagement and retention on the Chai platform.
  • Empirical results reveal that the blended approach delivers dynamic interactions and cost efficiency, paving the way for innovative conversational AI strategies.

Introduction

The field of conversational AI, particularly involving LLMs such as ChatGPT, has seen a trend toward creating ever-larger models to improve the quality of chat responses. However, these large models, often with hundreds of billions of parameters, come with significant computational and memory requirements. A recently introduced methodology called "Blending" addresses whether multiple smaller models combined could match or exceed the performance of a singular, larger model in the context of conversational AI.

Blending Methodology

The Blending technique involves integrating multiple smaller chat AI systems to work collaboratively, enabling the combined system to generate responses that harness the strengths of each individual model. Empirical tests on the Chai research platform have demonstrated that an ensemble comprised of three models, each with 6 to 13 billion parameters, can outdo a single model like ChatGPT, which boasts over 175 billion parameters. This is particularly noteworthy as the blended ensemble also yields significant improvements in user retention—indicating a more engaging user experience—while only requiring a fraction of the computational cost associated with larger models.

Empirical Evidence and Findings

A blend of smaller models, when selected randomly, appears to exhibit the “best of all” individual model characteristics, infusing diversity and a certain specialized expertise into the chat responses. This results in a more dynamic and engaging interaction for users. During the research period of thirty days, a comparison of user interaction statistics suggested superior performance of the blended models in both engagement and retention metrics, outpacing the singular large model's abilities.

Implications and Future Directions

The significant takeaway from the study is the possibility that increasing the sheer size of models may not be the only path toward enhancing conversational AI. By blending smaller models, not only can the efficiency in computational demands be maintained, but user engagement and conversation quality can also see marked improvements. Future research plans include scaling the number of component systems to enrich conversation diversity further and training classifiers to predict the optimal chat AI to respond at any given turn to maximize engagement. This could lead to a more nuanced selection process over just a uniform random choice and the potential to add new models without risking downgraded performance.

Conclusion

The Blending approach presents a compelling alternative to the industry's current trajectory of building increasingly LLMs for conversational AI. The evidence suggests that a collaborative multi-model approach yields significant improvements in user engagement while maintaining leaner computational requirements. As this methodology finds its way into practice, it has the potential to redefine the strategies for developing future chat AIs, advocating for a more collaborative, multi-faceted approach over size and scale.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 42 tweets with 930 likes about this paper.