Papers
Topics
Authors
Recent
Search
2000 character limit reached

MEM-MCL: Evolutionary Merging & Curriculum Learning

Updated 18 January 2026
  • The paper presents a three-phase pipeline that integrates instruction tuning, evolutionary model merging, and curriculum learning to enhance multi-task sentiment analysis.
  • The MEM-MCL framework utilizes LoRA for expert model creation and an evolutionary algorithm to merge models based on weak data, achieving 2–5 point accuracy gains.
  • Curriculum scheduling driven by meta data ranks tasks by difficulty, improving in-context learning across 15 diverse sentiment subtasks.

The Multi-stage Evolutionary Model Merging with Meta data driven Curriculum Learning (MEM-MCL) framework is a three-phase pipeline designed to convert a general-purpose LLM into a unified, sentiment-specialized model that supports a broad suite of sentiment analysis subtasks. MEM-MCL integrates supervised instruction tuning, evolutionary parameter merging guided by weak-data, and meta data driven curriculum learning to yield improvements in multi-task sentiment analysis accuracy and robustness across heterogeneous inputs (Inoshita et al., 11 Jan 2026).

1. Pipeline Architecture and Task Coverage

MEM-MCL consists of the following sequential processing stages:

  1. Supervised Fine-Tuning (SFT): The base LLM undergoes instruction-tuning with Low-Rank Adaptation (LoRA) to generate TT distinct expert models—each dedicated to a single sentiment analysis task.
  2. Multi-stage Evolutionary Model Merging (MEM): Within each group of correlated tasks—Sentiment Classification (SC), Aspect Based Sentiment Analysis (ABSA), and Multifaceted Subjectivity (MAST)—an evolutionary algorithm determines weighted combinations of task experts to merge into a group-level model. A second evolutionary merging consolidates group models into a single final model.
  3. Meta Data Driven Curriculum Learning (MCL): During inference, the unified model is exposed to each task in order of ascending difficulty, computed as a meta-feature–weighted score, thus organizing in-context learning by relative complexity.

The MEM-MCL system is demonstrated across 15 sentiment subtasks, including binary/multi-class sentiment classification, various ABSA forms such as ATSA and ASQP, and subjective/emotional analyses involving categorical emotions, hate speech, offensiveness, and irony (Inoshita et al., 11 Jan 2026).

2. Expert Model Construction with Instruction-Tuned LoRA

For each sentiment task t{1,,T}t\in\{1,\dots,T\}, the procedure is as follows:

  • Input data Dt={(It,i,At,i,yt,i)}i=1NtD_t = \{(I_{t,i}, A_{t,i}, y_{t,i})\}_{i=1}^{N_t}, where At,iA_{t,i} designates an aspect/category for ABSA tasks.
  • Each sample is converted into a prompt Pt,i=Prompt(Rt,It,i,At,i)P_{t,i} = \mathrm{Prompt}(R_t, I_{t,i}, A_{t,i}).
  • LoRA adaptation is applied to the base LLM (θ0\theta_0), resulting in expert parameters θ0+Δθt\theta_0 + \Delta\theta_t for task tt.
  • The predictive model is LLMLoRAθ0+Δθt(Pt,i)\mathrm{LLM}^{\mathrm{LoRA}_{\theta_0+\Delta\theta_t}}(P_{t,i}), and cross-entropy loss is minimized:

Lt(Δθt)=i=1Ntc=1Ct1[yt,i=c]logp^t,i(c).\mathcal{L}_t(\Delta\theta_t) = -\sum_{i=1}^{N_t} \sum_{c=1}^{C_t} \mathbf{1}[y_{t,i}=c]\log\,\hat p_{t,i}(c).

  • Each completed θ0+Δθt\theta_0 + \Delta\theta_t model is designated as expert MtM_t.

This process produces a set of instruction-tuned experts capable of specialized performance on their respective subtasks (Inoshita et al., 11 Jan 2026).

3. Evolutionary Model Merging and Weak Data Optimization

MEM partitions the collection of expert models into three groups corresponding to related subtasks: SC ({M1,,M5}\{M_1,\dots,M_5\}), ABSA ({M6,,M11}\{M_6,\dots,M_{11}\}), and MAST ({M12,,M15}\{M_{12},\dots,M_{15}\}). The evolutionary merging procedure is distinguished by:

  • Weak Data Extraction: For each expert MtM_t, infer over its own training data to extract DtweakD_t^{\mathrm{weak}}, the set of misclassified items. For group gg, define Dgweak=tgDtweakD_g^{\mathrm{weak}} = \bigcup_{t\in g} D_t^{\mathrm{weak}}.
  • Weight-based Parameter Merging: Seeking wg\mathbf{w}_g on the probability simplex, the merged group parameters θg(wg)=tgwtθt\theta_g(\mathbf{w}_g) = \sum_{t \in g} w_t \theta_t blend expert models.
  • Evolutionary Search Algorithm: A population of candidate weight vectors is evolved over generations. Fitness fg(w)f_g(\mathbf{w}) is the mean classification accuracy (or F1) on DgweakD_g^{\mathrm{weak}}:

fg(w)=1Dgweak(x,y)Dgweak1[argmaxcMg(w)(x)=y].f_g(\mathbf{w}) = \frac{1}{|D_g^{\mathrm{weak}}|}\sum_{(x,y)\in D_g^{\mathrm{weak}}} \mathbf{1}[\arg\max_c M_g(\mathbf{w})(x) = y].

  • Second-stage Group Merging: The final MEM model is constructed as a simplex-weighted combination of group-level models to maximize aggregate weak-data fitness.

Ablation reveals that removing weak-data–guided evolutionary fitness reduces average accuracy by approximately 3 points, highlighting the efficacy of focusing optimization on error-prone samples (Inoshita et al., 11 Jan 2026).

4. Curriculum-Based Inference Scheduling via Meta Data

Inference employs meta data driven curriculum learning (MCL) to present subtasks to the final model in ascending order of empirically estimated difficulty:

  • Difficulty Scoring: Each task tt is assigned a score

Score(t)=α1Ct+α2Vt+α3Zt+α4St\mathrm{Score}(t) = \alpha_1 C_t + \alpha_2 V_t + \alpha_3 Z_t + \alpha_4 S_t

using meta-features: number of classes (CtC_t), dataset diversity (VtV_t), structural complexity (ZtZ_t), and subjectivity (StS_t), with fixed weights (1.0,0.5,2.0,5.0)(1.0, 0.5, 2.0, 5.0).

  • Curriculum Scheduling: Tasks are sorted by ascending score, ensuring in-context learning proceeds from easiest to hardest, without parameter updates.
  • Empirical Outcomes: The curriculum strategy yields a mean improvement of 1.5 points on 8 of 15 tasks relative to unstructured task inference (random order). The effect is particularly prominent for ACSA, ASD, and certain MAST subtasks.

This component exploits the LLM’s in-context processing ability, facilitating competence accrual as difficulty increases (Inoshita et al., 11 Jan 2026).

5. Unified Optimization Objective

Although MEM is post-hoc weight blending, the joint process can be interpreted as minimizing the sum of expert task losses plus a regularization that enforces proximity between MEM group parameters and their expert counterparts:

min{Δθt},{wg}t=1TLt(Δθt)+λg{SC,ABSA,MAST}θg(wg)θg22\min_{\{\Delta\theta_t\}, \{w_g\}} \sum_{t=1}^T\mathcal{L}_t(\Delta\theta_t) + \lambda \sum_{g \in \{SC, ABSA, MAST\}} \|\theta_g(w_g)-\theta_g^*\|_2^2

subject to tgwt=1\sum_{t \in g} w_t = 1 and wt0w_t \geq 0. In practical implementation, SFT and MEM stages are decoupled—task-specific LoRA optimization followed by independent evolutionary weight search (Inoshita et al., 11 Jan 2026).

6. Experimental Design, Results, and Implementation

Datasets: 15 subtasks including SC (2/3/5-way), ABSA variants, and MAST (emotion, hate, offensive, irony).

Metrics: SC/MB use accuracy; ABSA uses micro-F1; MAST uses macro-F1.

Baselines: Base Llama-2-7B and LoRA-tuned single-task experts.

Findings:

  • MEM improves on base LLM in 13/15 tasks, with 2–5 point absolute gains on ATSA/ACSA/ASQP.
  • MEM matches or exceeds LoRA experts on 12/15 tasks (indicating cross-task synergy).
  • MEM-MCL provides additional mean gains of 1.5 points over MEM on 8/15 tasks, especially for difficult categories.
  • Weak-data–guided MEM is crucial: its removal yields a 3-point average loss.
  • Random curriculum order reduces gains by about 1 point compared to meta data–driven ordering.

Implementation highlights:

Component Setting Notes
Base Model Llama-2-7B (open-source)
LoRA (SFT) rank 8, α\alpha=16, dropout=0.1, 3 epochs Micro-batch=1, LR=3e-4, wd=0.05
MEM generations=20, pop=20, crossover=single-point, mutation=0.1 ~2h per group
Curriculum No trainable parameters Pure inference
Hardware 4× NVIDIA A100 (40 GB), PyTorch ≥2.0 HuggingFace Transformers & PEFT
Compute Time ~12h/expert, ~2h/MEM group MCL negligible

7. Significance and Context

MEM-MCL achieves unified, multi-task sentiment analysis within a single LLM by integrating dedicated task-specific instruction tuning, evolutionary parameter blending with a focus on model “blind spots,” and curriculum-based inference leveraging structured meta data. The combination outperforms conventional, non-specialized LLMs and matches or surpasses the performance of specialized expert models across a diverse set of sentiment tasks. This suggests that judiciously targeted parameter merging and curriculum learning can yield high accuracy and robustness in complex, multi-faceted NLP workflows (Inoshita et al., 11 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-stage Evolutionary Model Merging with Meta data driven Curriculum Learning (MEM-MCL).