Data dynamics under extreme task‑imbalance ratios

Characterize the training dynamics and generalization behavior of multimodal reasoning models under extreme task‑imbalance ratios (approximately 1% or less) between categories such as mathematics and computer‑use, especially in settings with competing reasoning tasks.

Background

The authors’ ablations used relatively mild imbalance ratios (e.g., around 7.5% math data), and they reference long‑tailed classification literature where balancing strategies can improve worst‑group accuracy.

They note that understanding how extreme imbalance affects learning and performance in competing multimodal reasoning tasks remains unaddressed, particularly when one task class represents 1% or less of the data.

References

While well-studied in traditional machine learning settings such as long-tailed classification, understanding data dynamics at more extreme ratios (1\% or less) is an open problem, especially for performance on competing reasoning tasks.

Phi-4-reasoning-vision-15B Technical Report  (2603.03975 - Aneja et al., 4 Mar 2026) in Open research questions, Section 3.2 (Mathematics and Science vs. Computer-Use Data Proportion)