Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization

Published 21 May 2024 in cs.SI and cs.CY | (2405.12566v1)

Abstract: In today's digital landscape, the proliferation of conspiracy theories within the disinformation ecosystem of online platforms represents a growing concern. This paper delves into the complexities of this phenomenon. We conducted a comprehensive analysis of two distinct X (formerly known as Twitter) datasets: one comprising users with conspiracy theorizing patterns and another made of users lacking such tendencies and thus serving as a control group. The distinguishing factors between these two groups are explored across three dimensions: emotions, idioms, and linguistic features. Our findings reveal marked differences in the lexicon and language adopted by conspiracy theorists with respect to other users. We developed a machine learning classifier capable of identifying users who propagate conspiracy theories based on a rich set of 871 features. The results demonstrate high accuracy, with an average F1 score of 0.88. Moreover, this paper unveils the most discriminating characteristics that define conspiracy theory propagators.

Abstract PDF Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates that text analysis can reliably distinguish conspiracy theorists from generic users, achieving an F1 score of 0.87.
The study extracts emotional, idiomatic, and linguistic features from tweets to reveal distinct behavioral and writing style differences.
This research lays the groundwork for developing detection tools to better identify and mitigate the spread of online disinformation.

Profiling Online Conspiracy Theorists: A Text-Based Approach

Goals of the Study

This research aims to profile online users who propagate conspiracy theories by analyzing the text of their tweets. The authors worked with two groups: conspiracy theorists and a control group that doesn't exhibit such behavior. The central questions they sought to answer were:

Is it possible to identify a conspiracy user through text alone?
What features differentiate a conspiracy user from a generic user?

Methodology in a Nutshell

The researchers used data from X (formerly Twitter) and created two datasets: one of conspiracy theorists and another of non-conspiracy theorists. They employed a variety of text-based features grouped into three categories:

Emotions: Utilized zero-shot learning to detect eight basic emotions in tweets.
Idioms: Compiled a list of idioms commonly used by conspiracy theorists.
Linguistic Features: Analyzed lexical, syntactical, semantic, structural, and subject-specific characteristics.

Results Summary

The study found that conspiracy theorists and control users exhibit different writing styles which can be classified with high accuracy. The best-performing model achieved an impressive F1 score of 0.87.

Emotional Features

Emotional content was one aspect the study explored. Through sentiment analysis, it became clear that conspiracy theorists tend to show more disgust and sadness, while generic users lean towards joy and anger. Although emotional features were less discriminative compared to other categories, they still added value to the analysis.

Idioms of Conspiracy Theorists

The researchers also used idioms characteristic of conspiracy theorists. Some of the phrases they identified included:

"Trust no one"
"The truth is out there"
"Follow the money"

Interestingly, these idioms showed significant differences between the two user groups, offering another layer for classification.

Linguistic Features

The study's strongest results came from linguistic features. These included:

Lexical Features: Such as the count of unique words and punctuation usage.
Syntactical Features: Like the number of coordinating and subordinating clauses.
Semantic Features: Including the use of pronouns and named entities.
Structural Features: Capturing sentence length and other textual structuring elements.
Subject-specific Features: Focusing on readability indices, indicating the complexity of the language used.

Strong Numerical Results

The standout performance came from the Light Gradient Boosting Machine (LGBM) classifier:

F1 Score: 0.87
Precision and Recall: Both high, indicating a reliable model

Implications

Practical Implications

Detection Tools: This research could inform the development of tools to identify conspiracy theorists online, potentially aiding in the control of disinformation spread.
Content Moderation: Social media platforms can leverage these findings to fine-tune their content moderation strategies.

Theoretical Implications

Understanding Psycholinguistics: The study adds depth to our understanding of how language and emotional content can reveal underlying beliefs.
Robust Classifiers: Highlighting the efficacy of text-based classifiers builds a foundation for similar future research.

Speculating on the Future

Given the findings, future research could explore:

Platform Generalization: Testing if these features hold for other social media platforms like Facebook or Reddit.
Temporal Dynamics: Examining how these features evolve over time with changing social and political climates.
Broader Feature Sets: Incorporating more diverse linguistic and contextual features to boost accuracy further.

This study shines a bright light on the stylistic markers that differentiate conspiracy theorists from other users online. While focusing on text alone, the findings show promise for developing effective tools to identify and mitigate the spread of conspiracy theories. This research lays down a robust framework for future explorations into understanding and managing online disinformation.

Markdown Report Issue