Evolutionary Optimization of Model Merging Recipes

Published 19 Mar 2024 in cs.NE | (2403.13187v2)

Abstract: LLMs have become increasingly capable, but their development often requires substantial computational resources. While model merging has emerged as a cost-effective promising approach for creating new models by combining existing ones, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLM generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese VLMs. This work not only contributes new state-of-the-art models back to the open-source community, but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development.

Abstract PDF HTML Upgrade to Chat

References (51)

Citations (60)

View on Semantic Scholar

Summary

The paper introduces evolutionary algorithms to automate model merging by exploring both parameter and data flow spaces.
It combines parameter space merging and DFS merging to enhance foundational model performance on Japanese language and visual benchmarks.
Experimental results show state-of-the-art accuracy with resource efficiencies over traditional human-guided merging methods.

Evolutionary Optimization of Model Merging Recipes

In "Evolutionary Optimization of Model Merging Recipes," Akiba et al. propose an innovative approach using evolutionary algorithms to automate the creation of high-performance foundational models. Addressing the limitations of traditional model merging—which relies heavily on human intuition and domain-specific knowledge—the authors present a method that discovers optimal combinations of open-source models without extensive additional training data or computing resources.

Methodology and Approach

The proposed methodology employs evolutionary algorithms to facilitate model merging in two distinct spaces: parameter space (PS) and data flow space (DFS). This multi-faceted approach allows for optimization beyond weight merging, integrating layer permutations for enhanced model capabilities.

Parameter Space Merging

In PS merging, the authors use techniques such as TIES-Merging and DARE to analyze task vectors and facilitate granular layer-wise merging. By optimizing configuration parameters through evolutionary algorithms like CMA-ES, the method ensures the creation of models with superior performance across specified tasks.

Data Flow Space Merging

DFS merging, on the other hand, preserves the original layer weights but optimizes the path tokens take through the network during inference. This method leverages insights into distributed knowledge storage within models, aiming to identify novel configurations that enhance model performance. The search is conducted within a large space, simplified by repetition and selection indicators.

Integrated Merging Strategy

By combining PS and DFS merging strategies, the authors present a cohesive framework capable of producing models that exceed the capabilities of individual source models. The evolutionary approach enables optimization for models with multi-objective goals, improving generalization and efficiency.

Experiments and Results

The authors conducted experiments to evolve a Japanese LLM with Math reasoning capabilities and a Japanese VLM proficient in culturally-specific content. The evolved models demonstrated state-of-the-art performance on various benchmarks, surpassing previous models with significantly more parameters.

Japanese Math LLM

For the Japanese Math LLM, models were evaluated on the MGSM-JA dataset. The evolutionary merging approach led to substantial performance improvements, with the PS-merged model achieving notable accuracy gains. The DFS merging also contributed additional enhancement, underscoring the efficacy of the combined approach.

Figure 1: The performance success on MGSM-JA, where merged models outshine the original source models.

Japanese VLM

When applied to VLMs, the evolutionary approach successfully integrated the Japanese LLM within the VLM framework, culminating in improved performance on the JA-VG-VQA-500 and JA-VLM-Bench-In-the-Wild datasets. The strategy notably enhanced the model's handling of culturally-specific Japanese scenarios.

Discussion and Future Work

The paper emphasizes the potential of evolutionary model merging to democratize foundation model development, making it accessible and efficient. The approach addresses the high costs associated with traditional model development, suggesting a viable pathway for institutions to adopt evolutionary strategies in model creation.

Looking forward, the authors speculate the exploration of evolutionary principles in broader tasks, such as image generation models and multi-modal extensions. They are also considering evolution-driven selection of source models in large datasets. Moreover, the potential for developing swarms of models with complementary capabilities is being explored, introducing a collective intelligence model for further advancements in AI.

Conclusion

Akiba et al.'s work introduces a promising paradigm for model merging, showcasing the transformative impact of evolutionary algorithms on foundational model development. This approach not only offers cost-effective solutions but also enables the emergence of models with enhanced capabilities beyond the reach of conventional techniques. As the field progresses, evolutionary merging strategies may significantly influence future AI model development, fostering innovation and efficiency.

Markdown Report Issue