Using Psuedolabels for training Sentiment Classifiers makes the model generalize better across datasets

Published 5 Oct 2021 in cs.CL and cs.LG | (2110.02200v1)

Abstract: The problem statement addressed in this work is : For a public sentiment classification API, how can we set up a classifier that works well on different types of data, having limited ability to annotate data from across domains. We show that given a large amount of unannotated data from across different domains and pseudolabels on this dataset generated by a classifier trained on a small annotated dataset from one domain, we can train a sentiment classifier that generalizes better across different datasets.