Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Published 4 Jan 2024 in cs.CL and cs.AI | (2401.02994v3)

Abstract: In conversational AI research, there's a noticeable trend towards developing models with a larger number of parameters, exemplified by models like ChatGPT. While these expansive models tend to generate increasingly better chat responses, they demand significant computational resources and memory. This study explores a pertinent question: Can a combination of smaller models collaboratively achieve comparable or enhanced performance relative to a singular large model? We introduce an approach termed "blending", a straightforward yet effective method of integrating multiple chat AIs. Our empirical evidence suggests that when specific smaller models are synergistically blended, they can potentially outperform or match the capabilities of much larger counterparts. For instance, integrating just three models of moderate size (6B/13B paramaeters) can rival or even surpass the performance metrics of a substantially larger model like ChatGPT (175B+ paramaters). This hypothesis is rigorously tested using A/B testing methodologies with a large user base on the Chai research platform over a span of thirty days. The findings underscore the potential of the "blending" strategy as a viable approach for enhancing chat AI efficacy without a corresponding surge in computational demands.

Abstract PDF HTML Upgrade to Chat

References (42)

Citations (11)

View on Semantic Scholar

Summary

The paper introduces a Blending technique that integrates multiple smaller chat AI models to outperform a single large model.
The methodology employs an ensemble of three models (6–13B parameters) to achieve superior user engagement and retention on the Chai platform.
Empirical results reveal that the blended approach delivers dynamic interactions and cost efficiency, paving the way for innovative conversational AI strategies.

Introduction

The field of conversational AI, particularly involving LLMs such as ChatGPT, has seen a trend toward creating ever-larger models to improve the quality of chat responses. However, these large models, often with hundreds of billions of parameters, come with significant computational and memory requirements. A recently introduced methodology called "Blending" addresses whether multiple smaller models combined could match or exceed the performance of a singular, larger model in the context of conversational AI.

Blending Methodology

The Blending technique involves integrating multiple smaller chat AI systems to work collaboratively, enabling the combined system to generate responses that harness the strengths of each individual model. Empirical tests on the Chai research platform have demonstrated that an ensemble comprised of three models, each with 6 to 13 billion parameters, can outdo a single model like ChatGPT, which boasts over 175 billion parameters. This is particularly noteworthy as the blended ensemble also yields significant improvements in user retention—indicating a more engaging user experience—while only requiring a fraction of the computational cost associated with larger models.

Empirical Evidence and Findings

A blend of smaller models, when selected randomly, appears to exhibit the “best of all” individual model characteristics, infusing diversity and a certain specialized expertise into the chat responses. This results in a more dynamic and engaging interaction for users. During the research period of thirty days, a comparison of user interaction statistics suggested superior performance of the blended models in both engagement and retention metrics, outpacing the singular large model's abilities.

Implications and Future Directions

The significant takeaway from the study is the possibility that increasing the sheer size of models may not be the only path toward enhancing conversational AI. By blending smaller models, not only can the efficiency in computational demands be maintained, but user engagement and conversation quality can also see marked improvements. Future research plans include scaling the number of component systems to enrich conversation diversity further and training classifiers to predict the optimal chat AI to respond at any given turn to maximize engagement. This could lead to a more nuanced selection process over just a uniform random choice and the potential to add new models without risking downgraded performance.

Conclusion

The Blending approach presents a compelling alternative to the industry's current trajectory of building increasingly LLMs for conversational AI. The evidence suggests that a collaborative multi-model approach yields significant improvements in user engagement while maintaining leaner computational requirements. As this methodology finds its way into practice, it has the potential to redefine the strategies for developing future chat AIs, advocating for a more collaborative, multi-faceted approach over size and scale.

Markdown Report Issue