When Do Program-of-Thoughts Work for Reasoning?
Abstract: In the realm of embodied artificial intelligence, the reasoning capabilities of LLMs play a pivotal role. Although there are effective methods like program-of-thought prompting for LLMs which uses programming language to tackle complex reasoning tasks, the specific impact of code data on the improvement of reasoning capabilities remains under-explored. To address this gap, we propose complexity-impacted reasoning score (CIRS), which combines structural and logical attributes, to measure the correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity by considering the difficulty and the cyclomatic complexity. Through an empirical analysis, we find not all code data of complexity can be learned or understood by LLMs. Optimal level of complexity is critical to the improvement of reasoning abilities by program-aided prompting. Then we design an auto-synthesizing and stratifying algorithm, and apply it to instruction generation for mathematical reasoning and code data filtering for code generation tasks. Extensive results demonstrates the effectiveness of our proposed approach. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
- Falcon-40B: an open large language model with state-of-the-art performance.
- PaLM 2 Technical Report. arXiv:2305.10403.
- CodeKGC: Code Language Model for Generative Knowledge Graph Construction. CoRR, abs/2304.09048.
- Language Models are Few-Shot Learners. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- Chaudhary, S. 2023. Code Alpaca: An Instruction-following LLaMA model for code generation. https://github.com/sahil280114/codealpaca.
- Evaluating Large Language Models Trained on Code. CoRR, abs/2107.03374.
- Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks. CoRR, abs/2211.12588.
- KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction. In Laforest, F.; Troncy, R.; Simperl, E.; Agarwal, D.; Gionis, A.; Herman, I.; and Médini, L., eds., WWW ’22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25 - 29, 2022, 2778–2788. ACM.
- Tele-Knowledge Pre-training for Fault Analysis. arXiv:2210.11298.
- Binding Language Models in Symbolic Languages. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality.
- Training Verifiers to Solve Math Word Problems. CoRR, abs/2110.14168.
- Conklin, J. 2005. A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives complete edition.
- Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models’ Reasoning Performance. CoRR, abs/2305.17306.
- Specializing Smaller Language Models towards Multi-Step Reasoning. CoRR, abs/2301.12726.
- Complexity-Based Prompting for Multi-step Reasoning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- PAL: Program-aided Language Models. CoRR, abs/2211.10435.
- Large Language Models Are Not Abstract Reasoners. CoRR, abs/2305.19555.
- Haladyna, T. M. 1997. Writing Test Items to Evaluate Higher Order Thinking. ERIC.
- Halstead, M. H. 1977. Elements of Software Science (Operating and programming systems series). Elsevier Science Inc.
- Measuring Mathematical Problem Solving With the MATH Dataset. In Vanschoren, J.; and Yeung, S., eds., Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual.
- Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models. CoRR, abs/2305.18507.
- Towards Reasoning in Large Language Models: A Survey. CoRR, abs/2212.10403.
- VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models. CoRR, abs/2307.05973.
- Inner Monologue: Embodied Reasoning through Planning with Language Models. In Liu, K.; Kulic, D.; and Ichnowski, J., eds., Conference on Robot Learning, CoRL 2022, 14-18 December 2022, Auckland, New Zealand, volume 205 of Proceedings of Machine Learning Research, 1769–1782. PMLR.
- MathPrompter: Mathematical Reasoning using Large Language Models. CoRR, abs/2303.05398.
- CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors. CoRR, abs/2305.05711.
- Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems. In Barzilay, R.; and Kan, M., eds., Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, 158–167. Association for Computational Linguistics.
- The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code. CoRR, abs/2305.19213.
- Language Models of Code are Few-Shot Commonsense Learners. CoRR, abs/2210.07128.
- McCabe, T. J. 1976. A Complexity Measure. IEEE Trans. Software Eng., 2(4): 308–320.
- A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers. In Jurafsky, D.; Chai, J.; Schluter, N.; and Tetreault, J. R., eds., Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, 975–984. Association for Computational Linguistics.
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4. CoRR, abs/2306.02707.
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
- Are NLP Models really able to Solve Simple Math Word Problems? In Toutanova, K.; Rumshisky, A.; Zettlemoyer, L.; Hakkani-Tür, D.; Beltagy, I.; Bethard, S.; Cotterell, R.; Chakraborty, T.; and Zhou, Y., eds., Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, 2080–2094. Association for Computational Linguistics.
- Why think step-by-step? Reasoning emerges from the locality of experience. CoRR, abs/2304.03843.
- Reasoning with Language Model Prompting: A Survey. In ACL. The Association for Computational Linguistics.
- Solving General Arithmetic Word Problems. In Mà rquez, L.; Callison-Burch, C.; Su, J.; Pighin, D.; and Marton, Y., eds., Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, 1743–1752. The Association for Computational Linguistics.
- The Right Tool for the Job: Matching Model and Instance Complexities. arXiv:2004.07453.
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them. CoRR, abs/2210.09261.
- LLaMA: Open and Efficient Foundation Language Models. CoRR, abs/2302.13971.
- Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions? CoRR, abs/2305.12096.
- Voyager: An Open-Ended Embodied Agent with Large Language Models. CoRR, abs/2305.16291.
- Making Large Language Models Better Reasoners with Alignment. arXiv:2309.02144.
- Code4Struct: Code Generation for Few-Shot Structured Prediction from Natural Language. CoRR, abs/2210.12810.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In NeurIPS.
- Measuring Association Between Labels and Free-Text Rationales. In Moens, M.; Huang, X.; Specia, L.; and Yih, S. W., eds., Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, 10266–10284. Association for Computational Linguistics.
- Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding. CoRR, abs/2305.00633.
- LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning. In Goldberg, Y.; Kozareva, Z.; and Zhang, Y., eds., Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, 1–13. Association for Computational Linguistics.
- Scaling Relationship on Learning Mathematical Reasoning with Large Language Models. arXiv:2308.01825.
- The Impact of Symbolic Representations on In-context Learning for Few-shot Reasoning. CoRR, abs/2212.08686.
- A Survey of Large Language Models. CoRR, abs/2303.18223.
- PaD: Program-aided Distillation Specializes Large Models in Reasoning. CoRR, abs/2305.13888.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.