Dataset structuring for generalizable multimodal reasoning across tasks
Determine how to structure multimodal training datasets to induce generalizable representations across diverse reasoning tasks when training a single model to be simultaneously proficient at mathematics and computer‑use.
References
It is an open question in the research community to understand how datasets should be structured to induce generalizable representations across diverse reasoning tasks.
— Phi-4-reasoning-vision-15B Technical Report
(2603.03975 - Aneja et al., 4 Mar 2026) in Section 3.2 (Mathematics and Science vs. Computer-Use Data Proportion)