Language to Rewards for Robotic Skill Synthesis
Abstract: LLMs have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.
- PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753, 2022.
- Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598, 2022.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
- Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2022.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- Chatgpt for robotics: Design principles and model abilities. Microsoft Auton. Syst. Robot. Res, 2:20, 2023.
- Context-aware language modeling for goal-oriented dialogue systems. arXiv preprint arXiv:2204.10198, 2022.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
- Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302, 2022.
- BC-z: Zero-shot task generalization with robotic imitation learning. In 5th Annual Conference on Robot Learning, 2021. URL https://openreview.net/forum?id=8kbp23tSGYv.
- Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
- Interactive language: Talking to robots in real time. arXiv preprint arXiv:2210.06407, 2022.
- Robust recovery controller for a quadrupedal robot using deep reinforcement learning. arXiv preprint arXiv:1901.07517, 2019.
- Sim-to-real learning of all common bipedal gaits via periodic reward composition. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 7309–7315. IEEE, 2021.
- Relmogen: Leveraging motion generation in reinforcement learning for mobile manipulation. arXiv preprint arXiv:2008.07792, 2020.
- Learning navigation behaviors end-to-end with autorl. IEEE Robotics and Automation Letters, 4(2):2007–2014, 2019.
- Predictive Sampling: Real-time Behaviour Synthesis with MuJoCo. dec 2022. doi:10.48550/arXiv.2212.00541. URL https://arxiv.org/abs/2212.00541.
- Using natural language for reward shaping in reinforcement learning. arXiv preprint arXiv:1903.02020, 2019.
- Inferring rewards from language in context. arXiv preprint arXiv:2204.02515, 2022.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. URL https://openreview.net/forum?id=rc8o_j8I8PX.
- Correcting robot plans with natural language feedback. In Robotics: Science and Systems (RSS), 2022.
- Reward design with language models. In International Conference on Learning Representations (ICLR), 2023.
- H. Hu and D. Sadigh. Language instructed reinforcement learning for human-ai coordination. In 40th International Conference on Machine Learning (ICML), 2023.
- Translating structured english to robot controllers. Advanced Robotics, 22(12):1343–1359, 2008.
- Learning to parse natural language commands to a robot control system. In Experimental robotics: the 13th international symposium on experimental robotics, pages 403–415. Springer, 2013.
- Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding. arXiv preprint arXiv:2010.07954, 2020.
- A new path: Scaling vision-and-language navigation with synthetic instructions and imitation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10813–10823, 2023.
- Grounding language with visual affordances over unstructured data. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), London, UK, 2023.
- Bridge data: Boosting generalization of robotic skills with cross-domain datasets. arXiv preprint arXiv:2109.13396, 2021.
- From language to goals: Inverse reinforcement learning for vision-based instruction following. arXiv preprint arXiv:1902.07742, 2019.
- Lila: Language-informed latent actions. In Proceedings of the 5th Conference on Robot Learning (CoRL), 2021.
- Program synthesis with large language models. arXiv:2108.07732, 2021.
- Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning. arXiv:2006.08381, 2020.
- Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022.
- A large-scale benchmark for few-shot program induction and synthesis. In ICML, 2021.
- Learning abstract structure for drawing by efficient motor program induction. NeurIPS, 2020.
- Learning to synthesize programs as interpretable and generalizable policies. NeurIPS, 2021.
- Learning language-conditioned robot behavior from offline data and crowd-sourced annotation. In A. Faust, D. Hsu, and G. Neumann, editors, Proceedings of the 5th Conference on Robot Learning, volume 164 of Proceedings of Machine Learning Research, pages 1303–1315. PMLR, 08–11 Nov 2022. URL https://proceedings.mlr.press/v164/nair22a.html.
- Learning to understand goal specifications by modelling reward. arXiv preprint arXiv:1806.01946, 2018.
- Real-time natural language corrections for assistive robotic manipulators. International Journal of Robotics Research (IJRR), 36:684–698, 2017.
- “no, to the right”–online language corrections for robotic manipulation via shared autonomy. arXiv preprint arXiv:2301.02555, 2023.
- Reshaping robot trajectories using natural language commands: A study of multi-modal data alignment using transformers. In International Conference on Intelligent Robots and Systems (IROS), pages 978–984, 2022a.
- Latte: Language trajectory transformer. arXiv preprint arXiv:2208.02918, 2022b.
- Large language models are built-in autoregressive search engines. arXiv preprint arXiv:2305.09612, 2023.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012. doi:10.1109/IROS.2012.6386109.
- OpenAI. Gpt-4 technical report. arXiv, 2023.
- Learning and adapting agile locomotion skills by transferring experience. arXiv preprint arXiv:2304.09834, 2023.
- Opt-mimic: Imitation of optimized trajectories for dynamic quadruped behaviors. arXiv preprint arXiv:2210.01247, 2022.
- Barkour: Benchmarking animal-level agility with quadruped robots. arXiv preprint arXiv:2305.14654, 2023.
- F-vlm: Open-vocabulary object detection upon frozen vision and language models. arXiv preprint arXiv:2209.15639, 2023.
- E. Olson. Apriltag: A robust and flexible visual fiducial system. 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.