Optimal balance and automatic switching between reasoning and non‑reasoning modes

Determine the optimal ratio of reasoning‑trace to direct‑response data for supervised fine‑tuning of mixed‑mode multimodal models that use explicit <think>/<nothink> tokens (e.g., mid‑fusion models like Phi‑4‑reasoning‑vision‑15B), and establish how to ensure the model switches appropriately and automatically between chain‑of‑thought and non‑reasoning modes across tasks and deployment contexts.

Background

The model is trained with a hybrid mixture: reasoning samples include > … traces for domains like math and science, while non‑reasoning samples begin with a <nothink> token for perception‑focused tasks such as captioning, grounding, and OCR. The current split is roughly 20% reasoning and 80% non‑reasoning.

The authors acknowledge that the learned boundary between modes is imprecise and that the chosen data split may not be optimal for all domains or deployments. They explicitly identify the need to determine the ideal balance and to ensure appropriate automatic mode switching as an open problem.

References

Determining the ideal data balance, and ensuring that the model switches appropriately between modes, remains an open research problem.

— Phi-4-reasoning-vision-15B Technical Report (2603.03975 - Aneja et al., 4 Mar 2026) in Limitations and open questions, Section 4 (Mixed Non-Reasoning and Reasoning)

Optimal balance and automatic switching between reasoning and non‑reasoning modes

Background

References

Related Problems