Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mistral-Instruct-v0.3: 7B Instruction-Tuned Transformer

Updated 28 December 2025
  • Mistral-Instruct-v0.3 is a 7B parameter autoregressive transformer instruction-tuned for dialogue and domain adaptation using both PPE and ASI protocols.
  • Its adaptation methodologies combine domain-specific pre-training with versatile instruction tuning, leading to significant improvements in task accuracy and generalization.
  • Empirical evaluations showcase its efficiency with low energy costs and strong performance metrics, positioning it as a reference in lightweight, general-purpose LLM research.

Mistral-Instruct-v0.3 is a 7-billion-parameter, decoder-only, autoregressive transformer model instruction-tuned for dialogue and text generation tasks. As a foundation model, it supports a wide range of adaptation strategies, including continued domain pre-training and instruction finetuning, and is extensible for both zero-shot and in-context learning across diverse NLP tasks. Its open-weights accessibility, architectural modularity, and empirical efficiency position it as a reference point for research on lightweight, general-purpose LLMs and domain adaptation protocols.

1. Model Architecture and Instruction Tuning

Mistral-Instruct-v0.3 is built on a standard causal, multi-head attention transformer backbone, comprising 7 billion parameters. It is trained initially on large-scale web and code corpora via the cross-entropy objective for causal language modeling:

LCPT=iyilogpiL_{\mathrm{CPT}} = -\sum_{i} y_i \log p_i

where yiy_i is the reference one-hot token distribution and pip_i is the predicted probability vector. Instruction-tuning (affinage sur instructions, ASI) is performed using curated (prompt, response) pairs structured as {system, user, agent} triples. The ASI loss is computed only over the agent sequence:

Linst=tTagentvyt,vlogpt,vL_{\mathrm{inst}} = -\sum_{t \in T_{\text{agent}}} \sum_{v} y_{t,v} \log p_{t,v}

Key hyperparameters for training and instruction tuning include use of per_device_train_batch_size between 2 (pre-training) and 16 (finetuning), gradient_accumulation_steps to achieve effective large batches, cosine learning rate schedules, mixed precision (FP16), and gradient checkpointing. For robust instruction-following, a diverse set of 45 system prompt templates is used during ASI.

2. Methodologies for Domain Adaptation

Effective domain adaptation in Mistral-Instruct-v0.3 involves two sequential phases:

  1. Poursuite du pré-entraînement (PPE): Continued pre-training on domain-specific plain-text segments—such as defense-sector documents—packed to the model’s context limit and batched for efficient cross-entropy loss minimization. Preprocessing ensures removal of short or low-content segments and maximal coverage via block segmentation.
  2. Affinage sur instructions (ASI): Supervised instruction tuning on a mixture of tasks derived from the domain corpus and generalist sources, with data packing disabled to prevent overflow and maintain task granularity. ASI includes both synthetically generated and template-based instruction-response pairs to guarantee robustness against prompt variation.

Data for adaptation is stratified into “fermé” (closed) and “ouvert” (extended) partitions, with segment counts and token counts documented (e.g., AMIAD: 54,865 segments/48.6M tokens; AMIAD + Ouest-France: 65,775 segments/60.1M tokens), and task-specific and generative instruction sets (summaries, multiple-choice, Q&A, acronyms) generated using large LLMs and rule-based grammars.

3. Empirical Evaluation and Generalization

Performance evaluation leverages both quantitative and qualitative metrics:

  • Task accuracy (Acc\mathrm{Acc}): Proportion of correct responses for multiple-choice questions (QCM), factual QA, and acronym expansion, with normalization for variant forms.
  • Generative MOS (Mean Opinion Score): For summary and headline generation, rated on 0–5 using LLM-as-a-judge.
  • Strict accuracy: For tasks such as IFEval, where format adherence is critical.
  • Carbon footprint analysis: Energy use (EE, kWh) and CO₂-equivalent emissions (C=E×αC = E \times \alpha).

Performance gains are pronounced after ASI and optimal when PPE is combined with ASI incorporating both specific and generalist instructions. For instance, QCM accuracy on synthetic benchmarks rises from 57.5% (base) to 65.4% (ASI-only) and 65.6% (PPE+ASI with Tülu 3 Fr), with generative MOS for summaries also increasing markedly. On organizer-supplied datasets, factoid QA improves to 60% from a baseline of 10%. Notably, generalist task accuracy (e.g., on MMLU) is stable or improved if ASI contains out-of-domain/generalist instructions, while PPE alone can degrade such performance, underscoring the critical balance in adaptation protocol.

Modèle Résumé (MOS) QCM (Acc %) Factual (Acc %)
Baseline 3.24 57.5 5.6
ASI-only (fermé) 3.93 65.4 9.2
PPE+ASI (ouvert+Tülu 3 Fr) 3.66 65.6 11.3

Table 1: Extracted synthetic defense benchmark results (AMIAD-derived).

4. Data Selection and Instruction Generation

Data is systematically filtered for length and metadata artifacts. Instructional data covers summarization, titling, MCQs, factoid Q&A, and procedural prompts, sourced from generative LLMs (e.g., GPT-4.1-mini, GPT-4o-mini), domain templates (e.g., acronym translation), and generalist sets from Tülu 3 Fr (post-filtering for language via LangDetect). Random sampling governs assignment to training, validation, and test splits (80/10/10). Instruction diversity is enforced without recourse to any selection or scoring heuristic.

5. Adaptation Efficiency and Carbon Footprint

Resource accounting for domain adaptation is explicit: PPE and ASI, along with synthetic instruction generation, together incur a total energy cost under 100 kWh (~3 kgCO₂e), with breakdowns per task provided (e.g., PPE ~18 kWh, ASI ~7–12 kWh, instruction generation ~46–59 kWh). These values demonstrate the climate viability of adaptation for models at the 7B parameter scale, particularly relative to contemporary foundation models.

Run PPE (gCO₂e, kWh) ASI (gCO₂e, kWh) Gen instr. (gCO₂e, kWh)
fermé 1/2 (ASI) 0 385 (7.5) 1420 (45.9)
ouvert+Tülu 3 Fr (ASI) 930 (18.1) 623 (12.2) 1730 (58.9)

6. Trade-offs and Recommendations

The adaptation regime uncovers a classic trade-off: continued pre-training (PPE) is required to infuse domain-specific factoid knowledge, but must always be followed by robust, mix-domain instruction tuning to preserve and enhance generalist and format-following capabilities. Empirical results confirm that a two-phase protocol—domain text PPE plus diverse ASI—yields a domain-specialized 7B model with negligible or positive generalization loss and sharply improved domain performance, without prohibitive energy cost. The findings from O_FT@EvalLLM2025 demonstrate the practicality and reproducibility of effective domain adaptation at scale for compact LLMs (Rousseau et al., 7 Jul 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mistral-Instruct-v0.3.