MEM-MCL: Evolutionary Merging & Curriculum Learning

Updated 18 January 2026

The paper presents a three-phase pipeline that integrates instruction tuning, evolutionary model merging, and curriculum learning to enhance multi-task sentiment analysis.
The MEM-MCL framework utilizes LoRA for expert model creation and an evolutionary algorithm to merge models based on weak data, achieving 2–5 point accuracy gains.
Curriculum scheduling driven by meta data ranks tasks by difficulty, improving in-context learning across 15 diverse sentiment subtasks.

The Multi-stage Evolutionary Model Merging with Meta data driven Curriculum Learning (MEM-MCL) framework is a three-phase pipeline designed to convert a general-purpose LLM into a unified, sentiment-specialized model that supports a broad suite of sentiment analysis subtasks. MEM-MCL integrates supervised instruction tuning, evolutionary parameter merging guided by weak-data, and meta data driven curriculum learning to yield improvements in multi-task sentiment analysis accuracy and robustness across heterogeneous inputs (Inoshita et al., 11 Jan 2026).

1. Pipeline Architecture and Task Coverage

MEM-MCL consists of the following sequential processing stages:

Supervised Fine-Tuning (SFT): The base LLM undergoes instruction-tuning with Low-Rank Adaptation (LoRA) to generate $T$ distinct expert models—each dedicated to a single sentiment analysis task.
Multi-stage Evolutionary Model Merging (MEM): Within each group of correlated tasks—Sentiment Classification (SC), Aspect Based Sentiment Analysis (ABSA), and Multifaceted Subjectivity (MAST)—an evolutionary algorithm determines weighted combinations of task experts to merge into a group-level model. A second evolutionary merging consolidates group models into a single final model.
Meta Data Driven Curriculum Learning (MCL): During inference, the unified model is exposed to each task in order of ascending difficulty, computed as a meta-feature–weighted score, thus organizing in-context learning by relative complexity.

The MEM-MCL system is demonstrated across 15 sentiment subtasks, including binary/multi-class sentiment classification, various ABSA forms such as ATSA and ASQP, and subjective/emotional analyses involving categorical emotions, hate speech, offensiveness, and irony (Inoshita et al., 11 Jan 2026).

2. Expert Model Construction with Instruction-Tuned LoRA

For each sentiment task $t\in\{1,\dots,T\}$ , the procedure is as follows:

Input data $D_t = \{(I_{t,i}, A_{t,i}, y_{t,i})\}_{i=1}^{N_t}$ , where $A_{t,i}$ designates an aspect/category for ABSA tasks.
Each sample is converted into a prompt $P_{t,i} = \mathrm{Prompt}(R_t, I_{t,i}, A_{t,i})$ .
LoRA adaptation is applied to the base LLM ( $\theta_0$ ), resulting in expert parameters $\theta_0 + \Delta\theta_t$ for task $t$ .
The predictive model is $\mathrm{LLM}^{\mathrm{LoRA}_{\theta_0+\Delta\theta_t}}(P_{t,i})$ , and cross-entropy loss is minimized:

$\mathcal{L}_t(\Delta\theta_t) = -\sum_{i=1}^{N_t} \sum_{c=1}^{C_t} \mathbf{1}[y_{t,i}=c]\log\,\hat p_{t,i}(c).$

Each completed $\theta_0 + \Delta\theta_t$ model is designated as expert $M_t$ .

This process produces a set of instruction-tuned experts capable of specialized performance on their respective subtasks (Inoshita et al., 11 Jan 2026).

3. Evolutionary Model Merging and Weak Data Optimization

MEM partitions the collection of expert models into three groups corresponding to related subtasks: SC ( $\{M_1,\dots,M_5\}$ ), ABSA ( $\{M_6,\dots,M_{11}\}$ ), and MAST ( $\{M_{12},\dots,M_{15}\}$ ). The evolutionary merging procedure is distinguished by:

Weak Data Extraction: For each expert $M_t$ , infer over its own training data to extract $D_t^{\mathrm{weak}}$ , the set of misclassified items. For group $g$ , define $D_g^{\mathrm{weak}} = \bigcup_{t\in g} D_t^{\mathrm{weak}}$ .
Weight-based Parameter Merging: Seeking $\mathbf{w}_g$ on the probability simplex, the merged group parameters $\theta_g(\mathbf{w}_g) = \sum_{t \in g} w_t \theta_t$ blend expert models.
Evolutionary Search Algorithm: A population of candidate weight vectors is evolved over generations. Fitness $f_g(\mathbf{w})$ is the mean classification accuracy (or F1) on $D_g^{\mathrm{weak}}$ :

$f_g(\mathbf{w}) = \frac{1}{|D_g^{\mathrm{weak}}|}\sum_{(x,y)\in D_g^{\mathrm{weak}}} \mathbf{1}[\arg\max_c M_g(\mathbf{w})(x) = y].$

Second-stage Group Merging: The final MEM model is constructed as a simplex-weighted combination of group-level models to maximize aggregate weak-data fitness.

Ablation reveals that removing weak-data–guided evolutionary fitness reduces average accuracy by approximately 3 points, highlighting the efficacy of focusing optimization on error-prone samples (Inoshita et al., 11 Jan 2026).

4. Curriculum-Based Inference Scheduling via Meta Data

Inference employs meta data driven curriculum learning (MCL) to present subtasks to the final model in ascending order of empirically estimated difficulty:

Difficulty Scoring: Each task $t$ is assigned a score

$\mathrm{Score}(t) = \alpha_1 C_t + \alpha_2 V_t + \alpha_3 Z_t + \alpha_4 S_t$

using meta-features: number of classes ( $C_t$ ), dataset diversity ( $V_t$ ), structural complexity ( $Z_t$ ), and subjectivity ( $S_t$ ), with fixed weights $(1.0, 0.5, 2.0, 5.0)$ .

Curriculum Scheduling: Tasks are sorted by ascending score, ensuring in-context learning proceeds from easiest to hardest, without parameter updates.
Empirical Outcomes: The curriculum strategy yields a mean improvement of 1.5 points on 8 of 15 tasks relative to unstructured task inference (random order). The effect is particularly prominent for ACSA, ASD, and certain MAST subtasks.

This component exploits the LLM’s in-context processing ability, facilitating competence accrual as difficulty increases (Inoshita et al., 11 Jan 2026).

5. Unified Optimization Objective

Although MEM is post-hoc weight blending, the joint process can be interpreted as minimizing the sum of expert task losses plus a regularization that enforces proximity between MEM group parameters and their expert counterparts:

$\min_{\{\Delta\theta_t\}, \{w_g\}} \sum_{t=1}^T\mathcal{L}_t(\Delta\theta_t) + \lambda \sum_{g \in \{SC, ABSA, MAST\}} \|\theta_g(w_g)-\theta_g^*\|_2^2$

subject to $\sum_{t \in g} w_t = 1$ and $w_t \geq 0$ . In practical implementation, SFT and MEM stages are decoupled—task-specific LoRA optimization followed by independent evolutionary weight search (Inoshita et al., 11 Jan 2026).

6. Experimental Design, Results, and Implementation

Datasets: 15 subtasks including SC (2/3/5-way), ABSA variants, and MAST (emotion, hate, offensive, irony).

Metrics: SC/MB use accuracy; ABSA uses micro-F1; MAST uses macro-F1.

Baselines: Base Llama-2-7B and LoRA-tuned single-task experts.

Findings:

MEM improves on base LLM in 13/15 tasks, with 2–5 point absolute gains on ATSA/ACSA/ASQP.
MEM matches or exceeds LoRA experts on 12/15 tasks (indicating cross-task synergy).
MEM-MCL provides additional mean gains of 1.5 points over MEM on 8/15 tasks, especially for difficult categories.
Weak-data–guided MEM is crucial: its removal yields a 3-point average loss.
Random curriculum order reduces gains by about 1 point compared to meta data–driven ordering.

Implementation highlights:

Component	Setting	Notes
Base Model	Llama-2-7B (open-source)
LoRA (SFT)	rank 8, $\alpha$ =16, dropout=0.1, 3 epochs	Micro-batch=1, LR=3e-4, wd=0.05
MEM	generations=20, pop=20, crossover=single-point, mutation=0.1	~2h per group
Curriculum	No trainable parameters	Pure inference
Hardware	4× NVIDIA A100 (40 GB), PyTorch ≥2.0	HuggingFace Transformers & PEFT
Compute Time	~12h/expert, ~2h/MEM group	MCL negligible

7. Significance and Context

MEM-MCL achieves unified, multi-task sentiment analysis within a single LLM by integrating dedicated task-specific instruction tuning, evolutionary parameter blending with a focus on model “blind spots,” and curriculum-based inference leveraging structured meta data. The combination outperforms conventional, non-specialized LLMs and matches or surpasses the performance of specialized expert models across a diverse set of sentiment tasks. This suggests that judiciously targeted parameter merging and curriculum learning can yield high accuracy and robustness in complex, multi-faceted NLP workflows (Inoshita et al., 11 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Multi-Stage Evolutionary Model Merging with Meta Data Driven Curriculum Learning for Sentiment-Specialized Large Language Modeling (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-stage Evolutionary Model Merging with Meta data driven Curriculum Learning (MEM-MCL).