Optimal teacher prompts for distillation

Determine which instruction prompts provided to the instruction-following teacher embedding model Qwen3-Embedding-4B yield the most empirically useful supervision during knowledge distillation to the student embedding models jina-embeddings-v5-text-small and jina-embeddings-v5-text-nano, in order to minimize ambiguity and improve transfer effectiveness when generating query and document embeddings.

Background

The distillation stage uses Qwen3-Embedding-4B as an instruction-following teacher while training smaller student encoders. Instruction prompts can meaningfully affect the embeddings produced by the teacher, but the authors avoided heavy prompt engineering because the best-performing instructions for distillation are not established.

To reduce ambiguity and improve transfer, the authors employed minimal instructions (generic query/document prefixes for the student and a single default retrieval instruction for the teacher). Identifying empirically optimal teacher prompts could further enhance the effectiveness of the distillation process across tasks and languages.

References

However, it leads to ambiguity when we do not know what instructions are empirically most useful and makes it harder for us to transfer knowledge through distillation.

— jina-embeddings-v5-text: Task-Targeted Embedding Distillation (2602.15547 - Akram et al., 17 Feb 2026) in Section 4.1 First-Stage: Embedding Distillation

Optimal teacher prompts for distillation

Background

References

Related Problems