Papers
Topics
Authors
Recent
Search
2000 character limit reached

Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages

Published 20 Oct 2020 in cs.CL | (2010.10472v1)

Abstract: Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict and large corpora are usually required to collect enough examples. This work shows a comparison of a neural model and character LLMs with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected, for example within a chat app. Such models are designed to be incrementally improved as feedback is given from users. In this work, we design a knowledge-base and prediction model embedded system for spelling correction in low-resource languages. Experimental results on multiple languages show that the model could become effective with a small amount of data. We perform experiments on both natural and synthetic data, as well as on data from two endangered languages (Ainu and Griko). Last, we built a prototype system that was used for a small case study on Hinglish, which further demonstrated the suitability of our approach in real world scenarios.

Citations (2)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.