Optimal balance and automatic switching between reasoning and non‑reasoning modes
Determine the optimal ratio of reasoning‑trace to direct‑response data for supervised fine‑tuning of mixed‑mode multimodal models that use explicit <think>/<nothink> tokens (e.g., mid‑fusion models like Phi‑4‑reasoning‑vision‑15B), and establish how to ensure the model switches appropriately and automatically between chain‑of‑thought and non‑reasoning modes across tasks and deployment contexts.
References
Determining the ideal data balance, and ensuring that the model switches appropriately between modes, remains an open research problem.
— Phi-4-reasoning-vision-15B Technical Report
(2603.03975 - Aneja et al., 4 Mar 2026) in Limitations and open questions, Section 4 (Mixed Non-Reasoning and Reasoning)