Papers
Topics
Authors
Recent
Search
2000 character limit reached

Kuaiji: the First Chinese Accounting Large Language Model

Published 21 Feb 2024 in cs.CL and cs.AI | (2402.13866v2)

Abstract: LLMs like ChatGPT and GPT-4 have demonstrated impressive proficiency in comprehending and generating natural language. However, they encounter difficulties when tasked with adapting to specialized domains such as accounting. To address this challenge, we introduce Kuaiji, a tailored Accounting LLM. Kuaiji is meticulously fine-tuned using the Baichuan framework, which encompasses continuous pre-training and supervised fine-tuning processes. Supported by CAtAcctQA, a dataset containing large genuine accountant-client dialogues, Kuaiji exhibits exceptional accuracy and response speed. Our contributions encompass the creation of the first Chinese accounting dataset, the establishment of Kuaiji as a leading open-source Chinese accounting LLM, and the validation of its efficacy through real-world accounting scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Difficulties in the accounting research-practice-teaching relationship: Evidence from romania. Accounting and Management Information Systems, 14(2):275, 2015.
  2. Skill-it! a data-driven skills framework for understanding and training language models, 2023.
  3. Chatlaw. https://github.com/PKU-YuanGroup/ChatLaw, 2023.
  4. Chatlaw: Open-source legal large language model with integrated external knowledge bases, 2023.
  5. Qlora: Efficient finetuning of quantized llms, 2023.
  6. How abilities in large language models are affected by supervised fine-tuning data composition, 2024.
  7. The false promise of imitating proprietary llms. ArXiv preprint, abs/2305.15717, 2023.
  8. Textbooks are all you need. ArXiv preprint, abs/2306.11644, 2023.
  9. Rethinking with retrieval: Faithful large language model inference, 2022.
  10. Lora: Low-rank adaptation of large language models, 2021.
  11. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
  12. Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pages 15696–15707. PMLR, 2023.
  13. Kala: Knowledge-augmented language model adaptation, 2022.
  14. Scaling laws for neural language models, 2020.
  15. Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge, 2023.
  16. OpenAI. Introducing chatgpt. https://openai.com/blog/chatgpt, 2022.
  17. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  18. The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only, 2023.
  19. Stephen Penman. Accounting for value. Columbia University Press, 2010.
  20. The curse of recursion: Training on generated data makes models forget. ArXiv preprint, abs/2305.17493, 2023.
  21. The typing cure: Experiences with large language model chatbots for mental health support, 2024.
  22. Shyam Sunder. Relationship between accounting changes and stock prices: problems of measurement and some empirical evidence. Journal of Accounting Research, pages 1–45, 1973.
  23. Llama: Open and efficient foundation language models, 2023.
  24. Gptvoicetasker: Llm-powered virtual assistant for smartphone, 2024.
  25. Self-instruct: Aligning language models with self-generated instructions, 2023.
  26. Bloom: A 176b-parameter open-access multilingual language model, 2023.
  27. Baichuan 2: Open large-scale language models, 2023.
  28. Fingpt: Open-source financial large language models, 2023.
  29. Instruct-fingpt: Financial sentiment analysis by instruction tuning of general-purpose large language models, 2023.
  30. Huatuogpt, towards taming language model to be a doctor, 2023.
  31. Bo Zhu and Feng Niu. Investor sentiment, accounting information and stock price: Evidence from china. Pacific-Basin Finance Journal, 38:125–134, 2016.
  32. Fine-tuning language models from human preferences, 2020.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.