Unsupervised Machine Learning of Open Source Russian Twitter Data Reveals Global Scope and Operational Characteristics

Published 2 Oct 2018 in cs.SI | (1810.01466v1)

Abstract: We developed and used a collection of statistical methods (unsupervised machine learning) to extract relevant information from a Twitter supplied data set consisting of alleged Russian trolls who (allegedly) attempted to influence the 2016 US Presidential election. These unsupervised statistical methods allow fast identification of (i) emergent language communities within the troll population, (ii) the transnational scope of the operation and (iii) operational characteristics of trolls that can be used for future identification. Using natural language processing, manifold learning and Fourier analysis, we identify an operation that includes not only the 2016 US election, but also the French National and both local and national German elections. We show the resulting troll population is composed of users with common, but clearly customized, behavioral characteristics.