TigerBot: An Open Multilingual Multitask LLM
Abstract: We release and introduce the TigerBot family of LLMs, consisting of base and chat models, sized from 7, 13, 70 and 180 billion parameters. We develop our models embarking from Llama-2 and BLOOM, and push the boundary further in data, training algorithm, infrastructure, and application tools. Our models yield meaningful performance gain over SOTA open-source models, e.g., Llama-2, specifically 6% gain in English and 20% gain in Chinese. TigerBot model family also achieves leading performance in major academic and industrial benchmarks and leaderboards. We believe that TigerBot represents just a snapshot of lightning-fast progression in LLM open-source community. Therefore, we are thrilled to give back by publicly releasing our models and reporting our approach behind, with additional emphases on building SOTA LLMs in a democratized way and making LLMs of use in real-world applications.
- Gqa: Training generalized multi-query transformer models from multi-head checkpoints. arXiv:2305.13245 [cs.CL], 05 2023.
- Anthropic. Claude 2. https://www.anthropic.com/index/claude-2, 06 2023.
- Semantic parsing on freebase from question-answer pairs. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013.
- Language models are few-shot learners. arXiv:2005.14165v4 [cs.CL], 05 2020.
- Walking down the memory maze: Beyond context limit through interactive reading. arXiv:2310.05029 [cs.CL], 10 2023.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv:1803.05457 [cs.AI], 03 2018.
- O. Contributors. Opencompass: A universal evaluation platform for foundation models. GitHub repository, 2023.
- Efficient and effective text encoding for chinese llama and alpaca. arXiv:2304.08177 [cs.CL], 04 2023.
- Flashattention: Fast and memory-efficient exact attention with io-awareness. arXiv:2205.14135 [cs.LG], 05 2022.
- Google. Sentencepiece. GitHub repository, 2023.
- Lora: Low-rank adaptation of large language models. arXiv:2106.09685 [cs.CL], 06 2021.
- Huggingface. Text generation inference. GitHub repository, 2023.
- Huggingface. Transformers. GitHub repository, 2023.
- Dense passage retrieval for open-domain question answering. EMNLP 2020, 04 2020.
- Chatharuhi: Reviving anime character in reality via large language model. arXiv:2308.09597 [cs.CL], 2023.
- Microsoft. Megatron-deepspeed. GitHub repository, 2023.
- Efficient large-scale language model training on gpu clusters using megatron-lm. arXiv:2104.04473 [cs.CL], 04 2021.
- NVIDIA. Tensorrt open source software. GitHub repository, 2023.
- Training language models to follow instructions with human feedback. arXiv:2203.02155v1 [cs.CL], 03 2022.
- O. Peckham. Meta completes research supercluster, announces next-gen datacenter. HPCwire: https://www.hpcwire.com/2023/05/18/meta-completes-research-supercluster-announces-next-gen-datacenter/, 05 2023.
- Yarn: Efficient context window extension of large language models. arXiv:2309.00071 [cs.CL], 09 2023.
- S. Pichai. An important next step on our ai journey. https://blog.google/technology/ai/bard-google-ai-search-updates/, 02 2023.
- Train short, test long: Attention with linear biases enables input length extrapolation. arXiv:2108.12409 [cs.CL], 08 2021.
- Direct preference optimization: Your language model is secretly a reward model. arXiv:2305.18290 [cs.LG], 05 2023.
- Zero: Memory optimizations toward training trillion parameter models. arXiv:1910.02054 [cs.LG] and In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20), 10 2019.
- Know what you don’t know: Unanswerable questions for squad. arXiv:1806.03822 [cs.CL], 06 2018.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv:2211.05100 [cs.CL], 11 2022.
- N. Shazeer. Glu variants improve transformer. arXiv:2002.05202 [cs.LG], 02 2020.
- Roformer: Enhanced transformer with rotary position embedding. arXiv:2104.09864 [cs.CL], 04 2021.
- Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv:1811.00937 [cs.CL], 11 2018.
- Stanford alpaca: An instruction-following llama model. GitHub repository, 2023.
- Llama: Open and efficient foundation language models. arXiv:2302.13971 [cs.CL], 02 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288 [cs.CL], 07 2023.
- Turboderp. Exllamav2. GitHub repository, 2023.
- Attention is all you need. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA., 06 2017.
- Recursively summarizing books with human feedback. arXiv:2109.10862 [cs.CL], 09 2021.
- Effective long-context scaling of foundation models. arXiv:2309.16039 [cs.CL], 09 2023.
- Zeroquant: Efficient and affordable post-training quantization for large-scale transformers. arXiv:2206.01861 [cs.CL], 06 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.