7B Fully Open Source Moxin-LLM/VLM -- From Pretraining to GRPO-based Reinforcement Learning Enhancement

Published 8 Dec 2024 in cs.CL, cs.AI, and cs.LG | (2412.06845v5)

Abstract: Recently, LLMs have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Although open-source LLMs present unprecedented opportunities for innovation and research, the commercialization of LLMs has raised concerns about transparency, reproducibility, and safety. Many open-source LLMs fail to meet fundamental transparency requirements by withholding essential components like training code and data, which may hinder further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed, adhering to principles of open science, open source, open data, and open access. We release the pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints, aiming to make continuous commitments to fully open-source LLMs. After pre-training the base model, we finetune the Moxin Base model with SOTA post-training framework and instruction data to obtain Moxin Instruct model. To improve the reasoning capability, we further finetune our Instruct model with chain-of-thought data distilled from DeepSeek R1, and then use Group Relative Policy Optimization (GRPO) following DeepSeek R1 to finetune our model, leading to the Moxin Reasoning model. Moreover, we develop our vision LLM based on our Moxin model. Experiments show that our models achieve superior performance in various evaluations such as zero-shot evaluation, few-shot evaluation, and CoT evaluation.

Abstract PDF HTML Upgrade to Chat

Summary

The paper demonstrates that a fully open-source LLM can achieve robust performance while adhering to complete transparency guidelines under the MOF.
It outlines innovative methods such as Grouped-Query Attention and Sliding Window Attention that optimize model efficiency and handling of long sequences.
The report evidences superior zero-shot and few-shot evaluation metrics compared to similar models, promoting reproducibility and open science.

Analyzing the Comprehensive Openness of Moxin-LLM: Technical Innovations and Implications

The technical report on Moxin-LLM presents the development and implications of Moxin 7B, a fully open-source LLM designed to adhere to the Model Openness Framework (MOF). The MOF is critical in promoting transparency, reproducibility, and complete access to model components, such as training datasets and code, which have often been restricted in some open-source models. The paper's pivotal focus is on demonstrating that a commitment to openness does not necessarily compromise model performance, as evidenced by the robust results of Moxin 7B.

Overview of Moxin 7B Development

Moxin 7B is constructed by extending the Mistral model architecture, leveraging techniques such as Grouped-Query Attention (GQA) and Sliding Window Attention (SWA) to optimize both performance and inference speed. The architecture is extended to 36 blocks, incorporating significant innovations in attention mechanisms that allow for efficient handling of long sequences.

The data preparation for Moxin 7B meticulously curates a mix from open-source datasets like SlimPajama and DCLM-BASELINE, while addressing common issues such as duplication and quality filtering using advanced methods including MinHashLSH for deduplication. This level of data curation ensures high-quality inputs during model training, thereby enhancing performance on a broad array of language processing tasks.

Performance Evaluation

Moxin 7B's performance was assessed against existing models such as Mistral-7B, LLaMA 2-7B, and Gemma-7B, using benchmarks like AI2 Reasoning Challenge, HellaSwag, MMLU, and others, across zero-shot and few-shot evaluations.

Zero-Shot Evaluation: Moxin-7B-finetuned achieved superior performance in complex reasoning tasks, notably PIQA, with its performance increasing from 78.07% to 82.24% compared to its base model, surpassing also other 7B category models.
Few-Shot Evaluation: The model demonstrated competitive scores, outperforming several state-of-the-art benchmarks due to its effective fine-tuning process, evidencing the training's impact on its few-shot learning capabilities.

The Moxin-7B-chat, aligned via supervised fine-tuning, also showed commendable results in the MTBench, scoring competitively against other models, thus reinforcing its utility as an interactive AI assistant.

Implications and Future Directions

The transparent development of Moxin 7B underscores a potential paradigm shift within the open-source LLM community. By fully disclosing training methodologies, datasets, and model configurations, Moxin 7B sets a precedent for enhancing collaborative research and innovation. This comprehensive openness fosters an inclusive and sustainable AI research environment, facilitating reproducibility and allowing researchers worldwide to build on robust model baselines.

The use of MOF as a guideline appears critical in combating so-called "openwashing" practices, ensuring that models labeled as open-source truly adhere to open science principles. The alignment of Moxin 7B with these standards propels further discourse on encouraging more AI entities to embrace this ethos.

In future directions, the research presents avenues for improving LLMs with further enhancements in training data quality and evaluating model alignment to diverse linguistic and application-specific contexts. Extending these advancements could significantly impact the practical utility of open-source LLMs in various sectors, from industrial applications to academic research.

In conclusion, the Moxin-LLM technical report illuminates the significance of full transparency in the development of AI models, supported by solid performance across language processing benchmarks. It posits an optimistic future for AI research characterized by cooperative development, accessibility, and the stimulating promise of open shared innovation.