Papers
Topics
Authors
Recent
Search
2000 character limit reached

Towards Comprehensive Vietnamese Retrieval-Augmented Generation and Large Language Models

Published 3 Mar 2024 in cs.CL | (2403.01616v2)

Abstract: This paper presents our contributions towards advancing the state of Vietnamese language understanding and generation through the development and dissemination of open datasets and pre-trained models for Vietnamese Retrieval-Augmented Generation (RAG) and LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (8)
  1. Vuong Quoc Binh. Binhvq News Corpus. https://github.com/binhvq/news-corpus, 2018. [Online; accessed 01-March-2024].
  2. Natural Language Processing Laboratory of Tsinghua University. Chinese Text Classification. http://thuctc.thunlp.org/, 2016. [Online; accessed 01-March-2024].
  3. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 3(6):7, 2023.
  4. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
  5. Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177, 2023.
  6. Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:2304.01196, 2023.
  7. PhoBERT: Pre-trained language models for Vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1037–1042, 2020.
  8. Culturax: A cleaned, enormous, and multilingual dataset for large language models in 167 languages. arXiv preprint arXiv:2309.09400, 2023.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.