Exploring and Improving the Spatial Reasoning Abilities of Large Language Models
Abstract: LLMs represent formidable tools for sequence modeling, boasting an innate capacity for general pattern recognition. Nevertheless, their broader spatial reasoning capabilities, especially applied to numerical trajectory data, remain insufficiently explored. In this paper, we investigate the out-of-the-box performance of ChatGPT-3.5, ChatGPT-4 and Llama 2 7B models when confronted with 3D robotic trajectory data from the CALVIN baseline and associated tasks, including 2D directional and shape labeling. Additionally, we introduce a novel prefix-based prompting mechanism, which yields a 33% improvement on the 3D trajectory data and an increase of up to 10% on SpartQA tasks over zero-shot prompting (with gains for other prompting types as well). The experimentation with 3D trajectory data offers an intriguing glimpse into the manner in which LLMs engage with numerical and spatial information, thus laying a solid foundation for the identification of target areas for future enhancements.
- Do as i can, not as i say: Grounding language in robotic affordances, 2022.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity, 2023.
- Language models are few-shot learners, 2020.
- Palm: Scaling language modeling with pathways, 2022.
- A. G. Cohn and J. Hernandez-Orallo. Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of llms, 2023.
- Annollm: Making large language models to be better crowdsourced annotators, 2023.
- 3d-llm: Injecting the 3d world into large language models, 2023.
- H. Hu and D. Sadigh. Language instructed reinforcement learning for human-ai coordination, 2023.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022.
- Reward design with language models, 2023.
- Large language models are few-shot health learners, 2023.
- A. Madaan and A. Yazdanbakhsh. Text and patterns: For effective chain of thought, it takes two to tango, 2022.
- Calvin: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks, 2022.
- Rethinking the role of demonstrations: What makes in-context learning work?, 2022.
- Large language models as general pattern machines, 2023.
- SPARTQA: A textual question answering benchmark for spatial reasoning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4582–4598, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.364. URL https://aclanthology.org/2021.naacl-main.364.
- OpenAI. Gpt-4 technical report, 2023.
- Impact of pretraining term frequencies on few-shot numerical reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 840–854, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-emnlp.59. URL https://aclanthology.org/2022.findings-emnlp.59.
- Openmask3d: Open-vocabulary 3d instance segmentation, 2023.
- Llama 2: Open foundation and fine-tuned chat models, 2023.
- Chain-of-thought prompting elicits reasoning in large language models, 2023.
- An explanation of in-context learning as implicit bayesian inference, 2022.
- Translating natural language to planning goals with large-language models, 2023.
- Pointllm: Empowering large language models to understand point clouds, 2023.
- H. Xue and F. D. Salim. Promptcast: A new prompt-based learning paradigm for time series forecasting, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.