Papers
Topics
Authors
Recent
Search
2000 character limit reached

LMPs: Language-Guided Robot Motion Primitives

Updated 9 February 2026
  • Language Movement Primitives (LMPs) are computational frameworks that translate free-form natural language instructions into parameterized robot motions.
  • They integrate hybrid task-frame control and DMP parameterization to support zero-shot or few-shot manipulation, achieving success rates up to 83% in advanced models.
  • LMPs offer scalable, semantically interpretable motion synthesis while highlighting challenges such as prompt dependency and limited dynamic scene adaptability.

Language Movement Primitives (LMPs) refer to a class of frameworks and computational abstractions that bridge natural language task representations with physically grounded motor behaviors, supporting a direct, semantically meaningful translation from human instructions to robot motion. LMPs embed language-understandable primitives or trajectories at a level suitable for direct execution by robot controllers, rather than relying solely on symbolic planning or dense motion scripting. Recent approaches instantiate LMPs via structured interfaces between large language (or vision–language) models and hybrid position/force controllers, or through parameterizing analytic trajectory generators such as Dynamic Movement Primitives (DMPs), thus enabling zero-shot or few-shot manipulation from free-form instructions (Cao et al., 2023, Dai et al., 2 Feb 2026, Saunders et al., 2021).

1. Foundational Concepts and Definitions

LMPs formalize the process of converting unconstrained linguistic instructions—such as "wipe the plate" or "insert the straw"—into parameterized motion descriptions that can be robustly executed by robots. Two principal instantiations have been described:

  • Hybrid Task-Frame Primitives: High-level tasks are mapped to hybrid position/force control set-points within object-centered task frames. Each primitive is defined by a local coordinate transformation, a set of six position/force (or velocity/torque) targets, and a binary mask indicating controllable axes (Cao et al., 2023).
  • Dynamic Movement Primitives (DMPs): Here, LMPs are defined as language-condensed parameter sets for DMPs, wherein a VLM predicts trajectory-shaping parameters (weights, goal positions) to instantiate smooth, stable robot motions. The DMP acts as an attractor system for each degree of freedom, ensuring the robot completes the articulated motion (Dai et al., 2 Feb 2026).

The unifying notion is that LMPs insert a compact, interpretable primitive layer—grounded in control theory—between the semantics of language and the requirements of continuous robot execution.

2. Task Frame Formalism and Hybrid Control

Classical layered robotics architectures divide the system into perception, planning, and execution. LMPs, when grounded in the Task Frame Formalism (TFF), address the persistent "symbol-to-motion" gap by introducing object-centric, programmatically defined control primitives.

A TFF-based LMP includes:

  • A local task frame {T}, specified as a homogeneous transformation WTTSE(3)^{W}T_T \in SE(3).
  • A 6-vector rd=[pd,ωd]r_d = [p_d, \omega_d]^\top, encoding three position/force and three velocity/torque set-points.
  • A binary mask s{0,1}6s \in \{0,1\}^6, indicating which axes are active and what type of control (position or force) is applied.

Hybrid controllers (e.g., Raibert–Craig) implement these primitives: u=Kpsp(pdp)+Kfsf(fdf)u = K_p \, s_p \,(p_d - p) + K_f \, s_f \,(f_d - f) where sps_p, sfs_f select position vs. force axes. This gives LMPs an explicit mapping from language-parameterized primitive definitions to actionable low-level commands (Cao et al., 2023).

3. VLM-Based DMP Parameterization for Robot Manipulation

An alternative LMP paradigm relies on Dynamic Movement Primitives. DMPs represent each controlled degree of freedom as a spring–damper system perturbed by a nonlinear forcing term f(s)f(s): τv˙=αz[βz(gp)v]+f(s),τp˙=v\tau \dot{v} = \alpha_z [\beta_z (g - p) - v] + f(s),\qquad \tau \dot{p} = v with a canonical system enforcing s0s \rightarrow 0. f(s)f(s) is generated via a weighted sum of Gaussian basis functions, with weights wiw_i set to shape the trajectory as desired.

In the LMP context, a vision–LLM predicts:

  • Subtask decompositions {ϕi}\{\phi_i\} matching language to scene
  • DMP goal offset parameters (Δz\Delta z, Δψ\Delta \psi)
  • DMP weight matrices WRM×BW \in \mathbb{R}^{M \times B}, one per controlled DOF

A "decomposer" model predicts sequential subtasks, and a "generator" infers precise DMP parameters conditioned on subtask, scene, and task-specific grounding prompts (Dai et al., 2 Feb 2026). By doing so, LMPs provide a mechanism for zero-shot manipulation generalization across diverse tasks and object configurations.

4. Prompting and Interface Strategies for LMP Generation

To reliably elicit structured LMPs from LLMs, prompt engineering is critical. In hybrid control-based approaches, prompts use a program-function-like structure:

  • Multiple "source" functions encode primitive-to-setpoint mappings.
  • A single "target" function, left partially blank, guides the LLM to emit a new primitive consistent with the learned grammar.
  • Explicit skeletal structure (function names, comments, unit tags) enforces machine-parseability (Cao et al., 2023).

For DMP-based LMPs, the prompt includes grounding descriptions (how weight changes affect Cartesian motion) and example subtasks with explicit parameterizations, teaching the VLM the semantic mapping from language ("move higher," "make a circle") to motion-generation parameters (Dai et al., 2 Feb 2026).

This structured prompting is essential: zero-shot prompting yields no success, while in-context few-shot prompting with 3–5 examples boosts correct primitive generation to 70–83% for advanced models such as GPT-4 (Cao et al., 2023).

5. Empirical Performance and Comparative Results

Quantitative evaluation validates the efficacy of LMPs across several axes:

  • Hybrid Task-Frame LMPs (Cao et al., 2023):
    • 30 task benchmark with diverse manipulator primitives
    • LLMs evaluated in zero to five-shot prompt settings:
    • Zero-shot: 0% success (all models)
    • Five-shot: GPT-4 achieves 83% correct primitives; GPT-3.5 67%; Bard 70%
    • LLMs often fail when in-context examples are lacking or if prompt grammar is not enforced.
  • DMP-based LMPs (Dai et al., 2 Feb 2026):
    • Evaluated on a 7-DoF Franka robot over 20 real-world manipulation tasks.
    • Baselines: LLM-generated dense waypoints (TrajGen), VLA models fine-tuned on demonstrations (π0.5\pi_{0.5}).
    • LMPs: 80% task success (5 trials per task), outperforming TrajGen (30%) and π0.5\pi_{0.5} (31%); ablations (e.g. omitting user feedback or decomposer) reduce performance markedly.
    • Failure modes: planning errors, trajectory misgeneration, segmentation errors, and hardware failures.

6. Limitations and Future Research Directions

Current LMP frameworks exhibit several constraints:

  • Zero-shot instruction mapping is unreliable; LLMs require diverse prompt examples and precise structure (Cao et al., 2023).
  • LMPs focused on static or quasi-static scenes; dynamic obstacle avoidance or objects in motion are unsupported by current pipelines (Dai et al., 2 Feb 2026).
  • Human-in-the-loop correction ("judge") plays a significant role in practical performance; replacing with autonomous evaluators is an open goal.
  • DMP parameterization, while semantically interpretable, may not generalize to complex controllers (torque policies) or highly deformable tasks.
  • Extension to rich temporally extended skills, generalized skill templates, or multi-agent scenarios remains largely unexplored.
  • Expert balancing and specialization is nontrivial, as evidenced by the application of block-coordinate-descent and sparsity regularizers in analogous MoMP frameworks for sign language (Saunders et al., 2021).

Future progress will likely involve more robust prompt curricula, autonomous validation loops, expressivity beyond 6-DOF set-points, and broader task benchmarks mapping language directly onto motion-centric representations.

7. Connections to Broader Motion Primitive Research

LMPs generalize and unify several streams in motion primitive research:

  • Classical primitive libraries rely on hand-tuned mappings between symbolic tasks and atomic control routines; LMPs subsume this by predicting both the structure and parameters via language (Cao et al., 2023).
  • Mixture-of-motion-primitives (MoMP) models, as in sign language generation (Saunders et al., 2021), exploit dynamic blending of multiple transformer-expert primitives via gating networks, coupling high-level linguistic structure (e.g. glosses) to frame-wise kinematic predictions. The use of balancing losses and block-coordinate descent echoes the need for specialization and expert load distribution.
  • LMP formulations provide general mechanisms for grounding vision–language understanding into control-theoretic frameworks, enabling a spectrum of applications from industrial manipulation to avatar animation and beyond.

By defining motion primitives as the intermediate substrate between cognition-level task representations and low-level actuation, LMPs represent a scalable, interpretable paradigm for language-conditioned robot behavior synthesis.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Language Movement Primitives (LMPs).