Towards Comprehensive Vietnamese Retrieval-Augmented Generation and Large Language Models

Published 3 Mar 2024 in cs.CL | (2403.01616v2)

Abstract: This paper presents our contributions towards advancing the state of Vietnamese language understanding and generation through the development and dissemination of open datasets and pre-trained models for Vietnamese Retrieval-Augmented Generation (RAG) and LLMs.

Abstract PDF HTML Upgrade to Chat

Authors (6)

References (8)

Vuong Quoc Binh. Binhvq News Corpus. https://github.com/binhvq/news-corpus, 2018. [Online; accessed 01-March-2024].
Natural Language Processing Laboratory of Tsinghua University. Chinese Text Classification. http://thuctc.thunlp.org/, 2016. [Online; accessed 01-March-2024].
Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, 3(6):7, 2023.
Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177, 2023.
Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:2304.01196, 2023.
PhoBERT: Pre-trained language models for Vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1037–1042, 2020.
Culturax: A cleaned, enormous, and multilingual dataset for large language models in 167 languages. arXiv preprint arXiv:2309.09400, 2023.

Summary

No one has generated a summary of this paper yet.

Sign Up to Summarize

Paper to Video (Beta)

No one has generated a video about this paper yet.

Sign Up to Generate All Videos Subscribe on YouTube

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Sign Up to Generate

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Top Community Prompts

Explain it Like I'm 14

Practical Applications

Conceptual Simplification

Sign Up to Activate View All Prompts

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.