Kuaiji: the First Chinese Accounting Large Language Model
Abstract: LLMs like ChatGPT and GPT-4 have demonstrated impressive proficiency in comprehending and generating natural language. However, they encounter difficulties when tasked with adapting to specialized domains such as accounting. To address this challenge, we introduce Kuaiji, a tailored Accounting LLM. Kuaiji is meticulously fine-tuned using the Baichuan framework, which encompasses continuous pre-training and supervised fine-tuning processes. Supported by CAtAcctQA, a dataset containing large genuine accountant-client dialogues, Kuaiji exhibits exceptional accuracy and response speed. Our contributions encompass the creation of the first Chinese accounting dataset, the establishment of Kuaiji as a leading open-source Chinese accounting LLM, and the validation of its efficacy through real-world accounting scenarios.
- Difficulties in the accounting research-practice-teaching relationship: Evidence from romania. Accounting and Management Information Systems, 14(2):275, 2015.
- Skill-it! a data-driven skills framework for understanding and training language models, 2023.
- Chatlaw. https://github.com/PKU-YuanGroup/ChatLaw, 2023.
- Chatlaw: Open-source legal large language model with integrated external knowledge bases, 2023.
- Qlora: Efficient finetuning of quantized llms, 2023.
- How abilities in large language models are affected by supervised fine-tuning data composition, 2024.
- The false promise of imitating proprietary llms. ArXiv preprint, abs/2305.15717, 2023.
- Textbooks are all you need. ArXiv preprint, abs/2306.11644, 2023.
- Rethinking with retrieval: Faithful large language model inference, 2022.
- Lora: Low-rank adaptation of large language models, 2021.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
- Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pages 15696–15707. PMLR, 2023.
- Kala: Knowledge-augmented language model adaptation, 2022.
- Scaling laws for neural language models, 2020.
- Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge, 2023.
- OpenAI. Introducing chatgpt. https://openai.com/blog/chatgpt, 2022.
- OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
- The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only, 2023.
- Stephen Penman. Accounting for value. Columbia University Press, 2010.
- The curse of recursion: Training on generated data makes models forget. ArXiv preprint, abs/2305.17493, 2023.
- The typing cure: Experiences with large language model chatbots for mental health support, 2024.
- Shyam Sunder. Relationship between accounting changes and stock prices: problems of measurement and some empirical evidence. Journal of Accounting Research, pages 1–45, 1973.
- Llama: Open and efficient foundation language models, 2023.
- Gptvoicetasker: Llm-powered virtual assistant for smartphone, 2024.
- Self-instruct: Aligning language models with self-generated instructions, 2023.
- Bloom: A 176b-parameter open-access multilingual language model, 2023.
- Baichuan 2: Open large-scale language models, 2023.
- Fingpt: Open-source financial large language models, 2023.
- Instruct-fingpt: Financial sentiment analysis by instruction tuning of general-purpose large language models, 2023.
- Huatuogpt, towards taming language model to be a doctor, 2023.
- Bo Zhu and Feng Niu. Investor sentiment, accounting information and stock price: Evidence from china. Pacific-Basin Finance Journal, 38:125–134, 2016.
- Fine-tuning language models from human preferences, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.