2000 character limit reached
Universal Language Model Fine-Tuning with Subword Tokenization for Polish
Published 24 Oct 2018 in cs.CL, cs.LG, and stat.ML | (1810.10222v1)
Abstract: Universal LLM for Fine-tuning arXiv:1801.06146 is one of the first NLP methods for efficient inductive transfer learning. Unsupervised pretraining results in improvements on many NLP tasks for English. In this paper, we describe a new method that uses subword tokenization to adapt ULMFiT to languages with high inflection. Our approach results in a new state-of-the-art for the Polish language, taking first place in Task 3 of PolEval'18. After further training, our final model outperformed the second best model by 35%. We have open-sourced our pretrained models and code.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.