The paper surveys prior approaches spanning small expert models, robotics foundation models, and neuro-symbolic hybrids, and argues that none fully bridge high-level task reasoning with low-level continuous control across diverse tasks. This gap underlies the difficulty of creating broadly applicable robot policies that work without extensive in-domain data collection or fine-tuning.
The proposed Language Movement Primitives (LMPs) framework aims to partially address this by grounding vision-LLM reasoning in Dynamic Movement Primitive parameterization, enabling zero-shot manipulation through interpretable control parameters. Despite these advances, the broader objective of truly general-purpose robot policies is identified as an open challenge.