Papers
Topics
Authors
Recent
Search
2000 character limit reached

Current Landscape of the Russian Sentiment Corpora

Published 28 Jun 2021 in cs.CL | (2106.14434v1)

Abstract: Currently, there are more than a dozen Russian-language corpora for sentiment analysis, differing in the source of the texts, domain, size, number and ratio of sentiment classes, and annotation method. This work examines publicly available Russian-language corpora, presents their qualitative and quantitative characteristics, which make it possible to get an idea of the current landscape of the corpora for sentiment analysis. The ranking of corpora by annotation quality is proposed, which can be useful when choosing corpora for training and testing. The influence of the training dataset on the performance of sentiment analysis is investigated based on the use of the deep neural network model BERT. The experiments with review corpora allow us to conclude that on average the quality of models increases with an increase in the number of training corpora. For the first time, quality scores were obtained for the corpus of reviews of ROMIP seminars based on the BERT model. Also, the study proposes the task of the building a universal model for sentiment analysis.

Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.