What is the Role of Small Models in the LLM Era: A Survey

Published 10 Sep 2024 in cs.CL | (2409.06857v5)

Abstract: LLMs have made significant progress in advancing artificial general intelligence (AGI), leading to the development of increasingly large models such as GPT-4 and LLaMA-405B. However, scaling up model sizes results in exponentially higher computational costs and energy consumption, making these models impractical for academic researchers and businesses with limited resources. At the same time, Small Models (SMs) are frequently used in practical settings, although their significance is currently underestimated. This raises important questions about the role of small models in the era of LLMs, a topic that has received limited attention in prior research. In this work, we systematically examine the relationship between LLMs and SMs from two key perspectives: Collaboration and Competition. We hope this survey provides valuable insights for practitioners, fostering a deeper understanding of the contribution of small models and promoting more efficient use of computational resources. The code is available at https://github.com/tigerchen52/role_of_small_models

Abstract PDF Upgrade to Chat

Citations (5)

View on Semantic Scholar

Summary

The paper demonstrates that small models effectively complement LLMs through both collaborative strategies and competitive advantages in resource-constrained settings.
It reveals that small models enhance data curation and enable the weak-to-strong paradigm, optimizing large model training and inference efficiency.
The study highlights small models’ superior interpretability and domain adaptation, making them ideal for specialized, cost-sensitive AI applications.

Analyzing the Role of Small Models in the LLM Era

The paper "What is the Role of Small Models in the LLM Era: A Survey" by Lihu Chen and Gael Varoquaux addresses an underexplored but increasingly relevant topic: the place and significance of small models (SMs) in the epoch dominated by LLMs. This insightful survey highlights the dichotomy between LLMs and SMs through two primary lenses: Collaboration and Competition. The former underscores how SMs and LLMs can synergistically coexist, while the latter explores scenarios where SMs could potentially outperform LLMs.

Introduction and Context

LLMs such as GPT-4 and LLaMA-405B have shown exceptional abilities in various language tasks, pushing the boundaries of artificial general intelligence (AGI). However, the pursuit of increasingly more powerful models comes with a high computational and environmental cost. This scalability challenge raises questions about the practicality of deploying LLMs in resource-constrained environments. Consequently, the role and potential of smaller models in contemporary AI applications deserve a structured examination.

Collaboration: Complementary Strengths of LLMs and SMs

Data Curation

The paper points out that while larger datasets can enhance generalization in LLMs, not all data contributes equally to model performance. Here, small models can play a crucial role in curating high-quality data. Techniques such as data selection and data reweighting using small models can optimize pre-training datasets, thereby enhancing LLM performance while reducing computational overhead.

Weak-to-Strong Paradigm

Another interesting dimension of collaboration is the "Weak-to-Strong" paradigm, wherein weaker models are used to supervise and fine-tune more robust models. This methodology is particularly pertinent as LLMs evolve into superhuman models, often requiring supervision beyond human capabilities.

Efficient Inference

Model ensembling techniques like model cascading and model routing are highlighted as effective measures to manage computational resources. By dynamically assigning queries to models of varying sizes, systems can achieve a balance between performance and resource efficiency. Speculative decoding further extends this efficiency by leveraging smaller models to generate initial token candidates, subsequently refined by larger models.

Evaluating LLMs

The authors emphasize the role of SMs in evaluating outputs of LLMs. Traditional metrics are often inadequate for nuanced assessments of generated text. Therefore, proxy models like BERTSCORE and BARTSCORE, which measure semantic similarity and other dimensions, provide more robust evaluation frameworks.

Domain Adaptation

The adaptability of LLMs to specific domains is another focus area, where small domain-specific models guide the larger models, either through white-box adaptation involving internal state adjustments or black-box adaptation using retrieved external knowledge.

Retrieval-Augmented Generation and Prompt-based Learning

Small models, serving as efficient retrievers and augmenters for prompt-based learning, can significantly enhance the performance of LLMs. Techniques involving the retrieval of relevant documents or the decomposition of complex prompts into simpler tasks showcase the practical applicability of small models.

Deficiency Repair

Lastly, small models play a pivotal role in mitigating the deficiencies of LLMs, such as hallucinations and repeated texts, through techniques like contrastive decoding and specialized plug-ins addressing specific weaknesses.

Competition: Instances Favoring Small Models

Despite the impressive capabilities of LLMs, they may not always be the optimal choice. The survey identifies three primary scenarios where SMs could be preferable:

Computation-Constrained Environments

In environments where computational resources are limited, such as mobile devices or real-time applications, the efficiency and lower resource demands of SMs make them ideal.

Task-Specific Environments

Certain specialized tasks or domains, such as biomedical text mining or tabular data, do not benefit significantly from the extensive parameter count of LLMs. Fine-tuned small models can often achieve comparable or superior performance while being more resource-efficient.

Interpretability-Required Environments

In high-stakes fields like healthcare, finance, and legal services, the interpretability of models is crucial. Smaller, simpler models tend to offer better transparency, making them more suitable and trustworthy in these applications.

Conclusion

The paper concludes by emphasizing the importance of carefully evaluating the trade-offs between performance and efficiency when choosing between LLMs and SMs. While LLMs offer unparalleled capabilities, SMs bring accessibility, simplicity, lower costs, and interpretability, making them invaluable in many practical settings. This survey underscores the need for a balanced approach in leveraging both types of models to optimize computational resources and achieve cost-effective AI systems improvements. This is crucial for promoting more sustainable and democratized access to advanced AI technologies.