Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams
Abstract: LLMs have demonstrated remarkable performance on a wide range of NLP tasks, often matching or even beating state-of-the-art task-specific models. This study aims at assessing the financial reasoning capabilities of LLMs. We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios. We present an in-depth analysis of the models' performance and limitations, and estimate whether they would have a chance at passing the CFA exams. Finally, we outline insights into potential strategies and improvements to enhance the applicability of LLMs in finance. In this perspective, we hope this work paves the way for future studies to continue enhancing LLMs for financial reasoning through rigorous evaluation.
- 2021. An exploration of automatic text summarization of financial reports. In Proceedings of the Third Workshop on Financial Technology and Natural Language Processing, pages 1–7.
- Dogu Araci. 2019. Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.
- 2023. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity.
- 2023. Leveraging bert for extractive text summarization on federal police documents. Knowledge and Information Systems, pages 1–31.
- 2020. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- 2022. Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering.
- 2022. Palm: Scaling language modeling with pathways.
- 2023. Mathematical capabilities of chatgpt.
- 2023. Gpt-4 passes the bar exam. Available at SSRN 4389233.
- 2020. Unifiedqa: Crossing format boundaries with a single qa system.
- 2022. Ocr-free document understanding transformer. In European Conference on Computer Vision, pages 498–517. Springer.
- 2023. Performance of chatgpt on usmle: Potential for ai-assisted medical education using large language models. plos digit health 2 (2): e0000198.
- 2023. Causal reasoning and large language models: Opening a new frontier for causality.
- 2022. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35:21314–21328.
- 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
- 2022. Unified named entity recognition as word-word relation classification.
- 2023. Are chatgpt and gpt-4 general-purpose solvers for financial text analytics? an examination on several typical tasks. arXiv preprint arXiv:2305.05862.
- 2023. Few-shot fine-tuning vs. in-context learning: A fair comparison and evaluation. arXiv preprint arXiv:2305.16938.
- 2023. Exploring the effectiveness of gpt models in test-taking: A case study of the driver’s license knowledge test. arXiv preprint arXiv:2308.11827.
- 2023. Performance of chatgpt on free-response, clinical reasoning exams. medRxiv, pages 2023–03.
- 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- 2017. Attention is all you need. Advances in neural information processing systems, 30.
- 2022. A numerical reasoning question answering system with fine-grained retriever and the ensemble of multiple generators for finqa. arXiv preprint arXiv:2206.08506.
- 2023. Scibench: Evaluating college-level scientific problem-solving abilities of large language models. arXiv preprint arXiv:2307.10635.
- 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- 2023. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
- 2023a. Exploitgen: Template-augmented exploit code generation based on codebert. Journal of Systems and Software, 197:111577.
- 2023b. Fingpt: Open-source financial large language models. arXiv preprint arXiv:2306.06031.
- 2023. Extractive summarization via chatgpt for faithful summary generation. arXiv preprint arXiv:2304.04193.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.