Capacity limitations of DeepSeek-LLM 1.3B under mixed code-and-math one-stage training
Determine whether the degradation of mathematical reasoning without tool use observed when training DeepSeek-LLM 1.3B in a single stage on a mixture of code and mathematical data is caused by the model’s limited capacity to simultaneously assimilate both code and mathematical data.
References
One conjecture is that DeepSeek-LLM 1.3B, due to its limited scale, lacks the capacity to fully assimilate both code and mathematical data simultaneously.
— DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
(2402.03300 - Shao et al., 2024) in Section 6.1, Code Training Benefits Mathematical Reasoning (Results)