Family-internal regression of ministral-3:14b relative to ministral-3:8b

Ascertain the reason for the family-internal regression in which the 14B model ministral-3:14b is dominated across tiers by its 8B sibling ministral-3:8b on the AgentFloor benchmark, and identify the mechanism responsible for this underperformance.

Background

Across multiple tiers (A0, A, and B) in AgentFloor, the 14B ministral-3:14b model underperforms its 8B family counterpart, ministral-3:8b, by substantial margins. The authors characterize this as a large-magnitude, family-internal regression.

They explicitly note they do not fully understand this phenomenon, indicating that the cause of this regression is unresolved within their study.

References

Three observations we do not fully understand. Three cells in the corpus are descriptively striking and resist clean explanation. Ministral-3:14b is dominated cell by cell by its 8\,B sibling (A0 49 vs 75; A 24 vs 84; B 28 vs 84) --- a family-internal regression of large magnitude.

AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?  (2605.00334 - Karmakar et al., 1 May 2026) in Discussion, Section 7 ("Three observations we do not fully understand.")