CoLLiE: Collaborative Training of Large Language Models in an Efficient Way

Published 1 Dec 2023 in cs.CL | (2312.00407v1)

Abstract: LLMs are increasingly pivotal in a wide range of natural language processing tasks. Access to pre-trained models, courtesy of the open-source community, has made it possible to adapt these models to specific applications for enhanced performance. However, the substantial resources required for training these models necessitate efficient solutions. This paper introduces CoLLiE, an efficient library that facilitates collaborative training of LLMs using 3D parallelism, parameter-efficient fine-tuning (PEFT) methods, and optimizers such as Lion, Adan, Sophia, LOMO and AdaLomo. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization. CoLLiE has proven superior training efficiency in comparison with prevalent solutions in pre-training and fine-tuning scenarios. Furthermore, we provide an empirical evaluation of the correlation between model size and GPU memory consumption under different optimization methods, as well as an analysis of the throughput. Lastly, we carry out a comprehensive comparison of various optimizers and PEFT methods within the instruction-tuning context. CoLLiE is available at https://github.com/OpenLMLab/collie.

Abstract PDF HTML Upgrade to Chat

References (38)

Citations (5)

View on Semantic Scholar

Summary

The paper introduces CoLLiE, a novel approach that integrates 3D parallelism and PEFT methods to optimize large language model training.
It details a modular design with advanced optimizers like LOMO, reducing memory consumption and accelerating convergence in LLM training.
Empirical results demonstrate significant throughput gains and memory efficiency, confirmed by instruction-tuning a LLaMA-65B model.

CoLLiE: Collaborative Training of LLMs in an Efficient Way

The paper introduces CoLLiE, a library designed for efficiently facilitating the collaborative training of LLMs. With the increasing computational demands posed by expanding model sizes, the efficient utilization of resources is paramount. CoLLiE aims to address this through 3D parallelism, parameter-efficient fine-tuning (PEFT) methods, and an array of optimizers such as Lion, Adan, Sophia, LOMO, and AdaLomo.

Key Features and Contributions

3D Parallelism: CoLLiE leverages tensor parallelism, pipeline parallelism, and ZeRO-3. This integrated approach enables the training of large models by effectively partitioning and distributing workloads across multiple GPUs.
Parameter-efficient Fine-tuning: PEFT methods incorporated into CoLLiE, such as LoRA and prompt-tuning, allow for selective training of model parameters, facilitating memory efficiency.
Optimizer Integration: The library is equipped with several optimizers tailored for LLM training, enhancing memory conservation and achieving faster convergence. A notable inclusion is the LOMO optimizer, known for minimizing memory usage by not retaining any optimizer states.
FlashAttention: CoLLiE integrates FlashAttention to improve computational efficiency during training, significantly boosting throughput.
Modular Design: The architecture of CoLLiE promotes extensibility, coupling ease of customization with a user-friendly configuration interface through the CollieConfig class.

Performance Assessment

The numerical results in the paper illustrate CoLLiE's superior training efficiency across various dimensions:

Memory Requirements: The study profiles GPU memory usage, finding substantial reductions especially when employing optimizers like LOMO and PEFT methods, reducing memory consumption to approximately 2.1 times the model parameters' size.
Throughput: Experiments conducted show CoLLiE achieves significant throughput advantages over prevalent solutions, particularly on hardware with communication bottlenecks. This is notably attributed to the combination of TP and PP strategies.
Empirical Validation: By instruction-tuning a LLaMA-65B using CoLLiE, the research highlights significant performance improvements across tasks related to factual knowledge and instruction-following capabilities.

Implications and Future Work

The practical implications of CoLLiE are extensive for NLP researchers and practitioners. By enabling more efficient training of large models, CoLLiE allows for experimentation with larger models in resource-constrained environments. The potential for future research includes fine-grained profiling of memory allocation and extending the empirical evaluations across diverse model scales and training methodologies.

Conclusion

CoLLiE presents a comprehensive solution to the challenges of training LLMs efficiently. With robust support for 3D parallelism, innovative fine-tuning methods, and a suite of novel optimizers, CoLLiE positions itself as a valuable tool for advancing the capabilities of LLMs in practical and efficient ways. By addressing both scalability and efficiency, CoLLiE opens avenues for significant contributions to the field of AI and machine learning.