Mistral-Instruct-v0.3: 7B Instruction-Tuned Transformer

Updated 28 December 2025

Mistral-Instruct-v0.3 is a 7B parameter autoregressive transformer instruction-tuned for dialogue and domain adaptation using both PPE and ASI protocols.
Its adaptation methodologies combine domain-specific pre-training with versatile instruction tuning, leading to significant improvements in task accuracy and generalization.
Empirical evaluations showcase its efficiency with low energy costs and strong performance metrics, positioning it as a reference in lightweight, general-purpose LLM research.

Mistral-Instruct-v0.3 is a 7-billion-parameter, decoder-only, autoregressive transformer model instruction-tuned for dialogue and text generation tasks. As a foundation model, it supports a wide range of adaptation strategies, including continued domain pre-training and instruction finetuning, and is extensible for both zero-shot and in-context learning across diverse NLP tasks. Its open-weights accessibility, architectural modularity, and empirical efficiency position it as a reference point for research on lightweight, general-purpose LLMs and domain adaptation protocols.

1. Model Architecture and Instruction Tuning

Mistral-Instruct-v0.3 is built on a standard causal, multi-head attention transformer backbone, comprising 7 billion parameters. It is trained initially on large-scale web and code corpora via the cross-entropy objective for causal language modeling:

$L_{\mathrm{CPT}} = -\sum_{i} y_i \log p_i$

where $y_i$ is the reference one-hot token distribution and $p_i$ is the predicted probability vector. Instruction-tuning (affinage sur instructions, ASI) is performed using curated (prompt, response) pairs structured as {system, user, agent} triples. The ASI loss is computed only over the agent sequence:

$L_{\mathrm{inst}} = -\sum_{t \in T_{\text{agent}}} \sum_{v} y_{t,v} \log p_{t,v}$

Key hyperparameters for training and instruction tuning include use of per_device_train_batch_size between 2 (pre-training) and 16 (finetuning), gradient_accumulation_steps to achieve effective large batches, cosine learning rate schedules, mixed precision (FP16), and gradient checkpointing. For robust instruction-following, a diverse set of 45 system prompt templates is used during ASI.

2. Methodologies for Domain Adaptation

Effective domain adaptation in Mistral-Instruct-v0.3 involves two sequential phases:

Poursuite du pré-entraînement (PPE): Continued pre-training on domain-specific plain-text segments—such as defense-sector documents—packed to the model’s context limit and batched for efficient cross-entropy loss minimization. Preprocessing ensures removal of short or low-content segments and maximal coverage via block segmentation.
Affinage sur instructions (ASI): Supervised instruction tuning on a mixture of tasks derived from the domain corpus and generalist sources, with data packing disabled to prevent overflow and maintain task granularity. ASI includes both synthetically generated and template-based instruction-response pairs to guarantee robustness against prompt variation.

Data for adaptation is stratified into “fermé” (closed) and “ouvert” (extended) partitions, with segment counts and token counts documented (e.g., AMIAD: 54,865 segments/48.6M tokens; AMIAD + Ouest-France: 65,775 segments/60.1M tokens), and task-specific and generative instruction sets (summaries, multiple-choice, Q&A, acronyms) generated using large LLMs and rule-based grammars.

3. Empirical Evaluation and Generalization

Performance evaluation leverages both quantitative and qualitative metrics:

Task accuracy ( $\mathrm{Acc}$ ): Proportion of correct responses for multiple-choice questions (QCM), factual QA, and acronym expansion, with normalization for variant forms.
Generative MOS (Mean Opinion Score): For summary and headline generation, rated on 0–5 using LLM-as-a-judge.
Strict accuracy: For tasks such as IFEval, where format adherence is critical.
Carbon footprint analysis: Energy use ( $E$ , kWh) and CO₂-equivalent emissions ( $C = E \times \alpha$ ).

Performance gains are pronounced after ASI and optimal when PPE is combined with ASI incorporating both specific and generalist instructions. For instance, QCM accuracy on synthetic benchmarks rises from 57.5% (base) to 65.4% (ASI-only) and 65.6% (PPE+ASI with Tülu 3 Fr), with generative MOS for summaries also increasing markedly. On organizer-supplied datasets, factoid QA improves to 60% from a baseline of 10%. Notably, generalist task accuracy (e.g., on MMLU) is stable or improved if ASI contains out-of-domain/generalist instructions, while PPE alone can degrade such performance, underscoring the critical balance in adaptation protocol.

Modèle	Résumé (MOS)	QCM (Acc %)	Factual (Acc %)
Baseline	3.24	57.5	5.6
ASI-only (fermé)	3.93	65.4	9.2
PPE+ASI (ouvert+Tülu 3 Fr)	3.66	65.6	11.3

Table 1: Extracted synthetic defense benchmark results (AMIAD-derived).

4. Data Selection and Instruction Generation

Data is systematically filtered for length and metadata artifacts. Instructional data covers summarization, titling, MCQs, factoid Q&A, and procedural prompts, sourced from generative LLMs (e.g., GPT-4.1-mini, GPT-4o-mini), domain templates (e.g., acronym translation), and generalist sets from Tülu 3 Fr (post-filtering for language via LangDetect). Random sampling governs assignment to training, validation, and test splits (80/10/10). Instruction diversity is enforced without recourse to any selection or scoring heuristic.

5. Adaptation Efficiency and Carbon Footprint

Resource accounting for domain adaptation is explicit: PPE and ASI, along with synthetic instruction generation, together incur a total energy cost under 100 kWh (~3 kgCO₂e), with breakdowns per task provided (e.g., PPE ~18 kWh, ASI ~7–12 kWh, instruction generation ~46–59 kWh). These values demonstrate the climate viability of adaptation for models at the 7B parameter scale, particularly relative to contemporary foundation models.

Run	PPE (gCO₂e, kWh)	ASI (gCO₂e, kWh)	Gen instr. (gCO₂e, kWh)
fermé 1/2 (ASI)	0	385 (7.5)	1420 (45.9)
ouvert+Tülu 3 Fr (ASI)	930 (18.1)	623 (12.2)	1730 (58.9)

6. Trade-offs and Recommendations

The adaptation regime uncovers a classic trade-off: continued pre-training (PPE) is required to infuse domain-specific factoid knowledge, but must always be followed by robust, mix-domain instruction tuning to preserve and enhance generalist and format-following capabilities. Empirical results confirm that a two-phase protocol—domain text PPE plus diverse ASI—yields a domain-specialized 7B model with negligible or positive generalization loss and sharply improved domain performance, without prohibitive energy cost. The findings from O_FT@EvalLLM2025 demonstrate the practicality and reproducibility of effective domain adaptation at scale for compact LLMs (Rousseau et al., 7 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (1)

O_FT@EvalLLM2025 : étude comparative de choix de données et de stratégies d'apprentissage pour l'adaptation de modèles de langue à un domaine (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mistral-Instruct-v0.3.