MEM-MCL: Evolutionary Merging & Curriculum Learning
- The paper presents a three-phase pipeline that integrates instruction tuning, evolutionary model merging, and curriculum learning to enhance multi-task sentiment analysis.
- The MEM-MCL framework utilizes LoRA for expert model creation and an evolutionary algorithm to merge models based on weak data, achieving 2–5 point accuracy gains.
- Curriculum scheduling driven by meta data ranks tasks by difficulty, improving in-context learning across 15 diverse sentiment subtasks.
The Multi-stage Evolutionary Model Merging with Meta data driven Curriculum Learning (MEM-MCL) framework is a three-phase pipeline designed to convert a general-purpose LLM into a unified, sentiment-specialized model that supports a broad suite of sentiment analysis subtasks. MEM-MCL integrates supervised instruction tuning, evolutionary parameter merging guided by weak-data, and meta data driven curriculum learning to yield improvements in multi-task sentiment analysis accuracy and robustness across heterogeneous inputs (Inoshita et al., 11 Jan 2026).
1. Pipeline Architecture and Task Coverage
MEM-MCL consists of the following sequential processing stages:
- Supervised Fine-Tuning (SFT): The base LLM undergoes instruction-tuning with Low-Rank Adaptation (LoRA) to generate distinct expert models—each dedicated to a single sentiment analysis task.
- Multi-stage Evolutionary Model Merging (MEM): Within each group of correlated tasks—Sentiment Classification (SC), Aspect Based Sentiment Analysis (ABSA), and Multifaceted Subjectivity (MAST)—an evolutionary algorithm determines weighted combinations of task experts to merge into a group-level model. A second evolutionary merging consolidates group models into a single final model.
- Meta Data Driven Curriculum Learning (MCL): During inference, the unified model is exposed to each task in order of ascending difficulty, computed as a meta-feature–weighted score, thus organizing in-context learning by relative complexity.
The MEM-MCL system is demonstrated across 15 sentiment subtasks, including binary/multi-class sentiment classification, various ABSA forms such as ATSA and ASQP, and subjective/emotional analyses involving categorical emotions, hate speech, offensiveness, and irony (Inoshita et al., 11 Jan 2026).
2. Expert Model Construction with Instruction-Tuned LoRA
For each sentiment task , the procedure is as follows:
- Input data , where designates an aspect/category for ABSA tasks.
- Each sample is converted into a prompt .
- LoRA adaptation is applied to the base LLM (), resulting in expert parameters for task .
- The predictive model is , and cross-entropy loss is minimized:
- Each completed model is designated as expert .
This process produces a set of instruction-tuned experts capable of specialized performance on their respective subtasks (Inoshita et al., 11 Jan 2026).
3. Evolutionary Model Merging and Weak Data Optimization
MEM partitions the collection of expert models into three groups corresponding to related subtasks: SC (), ABSA (), and MAST (). The evolutionary merging procedure is distinguished by:
- Weak Data Extraction: For each expert , infer over its own training data to extract , the set of misclassified items. For group , define .
- Weight-based Parameter Merging: Seeking on the probability simplex, the merged group parameters blend expert models.
- Evolutionary Search Algorithm: A population of candidate weight vectors is evolved over generations. Fitness is the mean classification accuracy (or F1) on :
- Second-stage Group Merging: The final MEM model is constructed as a simplex-weighted combination of group-level models to maximize aggregate weak-data fitness.
Ablation reveals that removing weak-data–guided evolutionary fitness reduces average accuracy by approximately 3 points, highlighting the efficacy of focusing optimization on error-prone samples (Inoshita et al., 11 Jan 2026).
4. Curriculum-Based Inference Scheduling via Meta Data
Inference employs meta data driven curriculum learning (MCL) to present subtasks to the final model in ascending order of empirically estimated difficulty:
- Difficulty Scoring: Each task is assigned a score
using meta-features: number of classes (), dataset diversity (), structural complexity (), and subjectivity (), with fixed weights .
- Curriculum Scheduling: Tasks are sorted by ascending score, ensuring in-context learning proceeds from easiest to hardest, without parameter updates.
- Empirical Outcomes: The curriculum strategy yields a mean improvement of 1.5 points on 8 of 15 tasks relative to unstructured task inference (random order). The effect is particularly prominent for ACSA, ASD, and certain MAST subtasks.
This component exploits the LLM’s in-context processing ability, facilitating competence accrual as difficulty increases (Inoshita et al., 11 Jan 2026).
5. Unified Optimization Objective
Although MEM is post-hoc weight blending, the joint process can be interpreted as minimizing the sum of expert task losses plus a regularization that enforces proximity between MEM group parameters and their expert counterparts:
subject to and . In practical implementation, SFT and MEM stages are decoupled—task-specific LoRA optimization followed by independent evolutionary weight search (Inoshita et al., 11 Jan 2026).
6. Experimental Design, Results, and Implementation
Datasets: 15 subtasks including SC (2/3/5-way), ABSA variants, and MAST (emotion, hate, offensive, irony).
Metrics: SC/MB use accuracy; ABSA uses micro-F1; MAST uses macro-F1.
Baselines: Base Llama-2-7B and LoRA-tuned single-task experts.
Findings:
- MEM improves on base LLM in 13/15 tasks, with 2–5 point absolute gains on ATSA/ACSA/ASQP.
- MEM matches or exceeds LoRA experts on 12/15 tasks (indicating cross-task synergy).
- MEM-MCL provides additional mean gains of 1.5 points over MEM on 8/15 tasks, especially for difficult categories.
- Weak-data–guided MEM is crucial: its removal yields a 3-point average loss.
- Random curriculum order reduces gains by about 1 point compared to meta data–driven ordering.
Implementation highlights:
| Component | Setting | Notes |
|---|---|---|
| Base Model | Llama-2-7B (open-source) | |
| LoRA (SFT) | rank 8, =16, dropout=0.1, 3 epochs | Micro-batch=1, LR=3e-4, wd=0.05 |
| MEM | generations=20, pop=20, crossover=single-point, mutation=0.1 | ~2h per group |
| Curriculum | No trainable parameters | Pure inference |
| Hardware | 4× NVIDIA A100 (40 GB), PyTorch ≥2.0 | HuggingFace Transformers & PEFT |
| Compute Time | ~12h/expert, ~2h/MEM group | MCL negligible |
7. Significance and Context
MEM-MCL achieves unified, multi-task sentiment analysis within a single LLM by integrating dedicated task-specific instruction tuning, evolutionary parameter blending with a focus on model “blind spots,” and curriculum-based inference leveraging structured meta data. The combination outperforms conventional, non-specialized LLMs and matches or surpasses the performance of specialized expert models across a diverse set of sentiment tasks. This suggests that judiciously targeted parameter merging and curriculum learning can yield high accuracy and robustness in complex, multi-faceted NLP workflows (Inoshita et al., 11 Jan 2026).